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Foreword 


Here is a story the French photographer Henri Cartier-Bresson liked to tell: 
during the Second World War, he was hiding in a shed in the middle of the 
nondescript German countryside surrounded by a mountain range. He spent 
weeks there, fearing for his life. Then one day he visualized the ocean behind 
the mountain range. And this completely transformed his experience. Not 
only his experience of the mountain range, but also of his general situation 
and of himself.’ Mental imagery does have a huge influence on the way we 
perceive, and on our mental life in general. 

This book is about mental imagery and the important work it does in our 
mental life. It plays a crucial role in the vast majority of our perceptual epi- 
sodes. It also helps us understand many of the most puzzling features of per- 
ception (like the way it is influenced in a top-down manner and the way 
different sense modalities interact). But mental imagery also plays a very 
important role in emotions, action execution, and even in our desires. In sum, 
there are very few mental phenomena that mental imagery doesn’t show up 
in—in some way or other. The hope is that if we understand what mental 
imagery is, how it works and how it is related to other mental phenomena, we 
can make real progress on a number of important questions about the mind. 

I wrote this book for an interdisciplinary audience. As it aims to combine 
philosophy, psychology, and neuroscience to understand mental imagery, I 
have not presupposed any prior knowledge in any of these disciplines. As a 
result, readers with no background in any of these disciplines can also follow 
the arguments. 

The book has many short chapters, organized into five parts. Part I is about 
mental imagery, whereas the rest of the book is about the role it plays in per- 
ception (Part II), multimodal perception (Part III), cognition (Part IV), and 
action (Part V). The chapters are (almost) all self-standing, so the reader can 
jump around freely, but it probably makes sense to read Part I first. 

A lot has been written about imagination in the philosophy of mind and 
beyond. Mental imagery is a much more basic (and also conceptually less 
problematic) concept than imagination. Nonetheless, much less attention has 


1 I heard the story from Marine Franck, Cartier-Bresson’s widow in 2004 in Locarno. 
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been devoted to mental imagery. The aim of this book is to fill this gap and 
put mental imagery at the forefront of our thinking about the mind. 

David Hume memorably said that “the memory, senses, and understanding 
are, [...], all of them founded on the imagination.” My aim is to convince the 
reader that “the memory, senses, and understanding are, all of them founded” 
on mental imagery. 

I started working on the topic of mental imagery in the Fall of 2006, when, 
fresh out of PhD, I gave my first graduate seminar at Syracuse University. And 
mental imagery has been the primary focus of my research in the last eight 
years or so. As a result, it is difficult to enumerate all those who helped me to 
think through these issues and also the venues where I presented the material 
of the book, but I'll try it anyway. 

I gave talks that were directly related to mental imagery at various confer- 
ences of the American Philosophical Association (one Eastern, one Pacific, 
two Central) as well as the University of Manchester, the University of British 
Columbia, Simon Fraser University, University of Geneva, various confer- 
ences of the American Society of Aesthetics, University of Cardiff, University 
of Sheffield (twice), University of Porto, University of Warwick, various con- 
ferences and talks at Ruhr Universitat Bochum, University of Bergen, Berlin 
School of Mind and Brain (twice), various conferences of the Association of 
the Scientific Study of Consciousness, University of Leeds, Institut Jean 
Nicod, Courtauld Institute, University of East Anglia, University of Urbino, 
University of Milan (twice), University of Exeter, University of Oslo, Barnard 
College, University of Salzburg (twice), University of Aix/Marseilles, Oxford 
University (twice), Southern Society of Philosophy and Psychology, University 
of North Carolina, Chapel Hill, Hebrew University, Jerusalem, Kirschberg 
Symposium, University of Southampton, University of Nijmegen, University 
of Toronto, University of Ghent, University of Torino, NYU Abu Dhabi, 
Bilkent University, various online talks during the COVID-19 pandemic, 
Washington University, Saint Louis, University of Kent, Humboldt Universitat, 
Berlin, University of Krakow, University of Bristol, City University of New 
York, Universita Svizzera Italiana, LMU Munich, University of Lisbon. 

Some versions of the arguments I presented in this book were published 
in the following journals: Philosophical Transactions of the Royal Society B, 
Cognition, Cortex, Philosophical Studies, Pacific Philosophical Quarterly, 
Analysis, Journal of the American Society of Aesthetics, Ergo, Synthese, 


* Hume: Treatise of Human Nature 1.4.7.3-4. I want to leave open the possibility that what Hume 
meant by “imagination” is in fact closer to what I mean by “mental imagery.” 
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Multisensory Research, Thought, Perception, i-Perception, Consciousness and 
Cognition, The Monist, Frontiers in Psychology, Mind & Language, Journal of 
Consciousness Studies, as well as in various edited volumes. 

There is no way I can enumerate everyone who gave excellent feedback to 
some of the material in this book, but I am especially grateful to those who 
read and commented on either the entire book or almost the entire book: 
Dustin Stokes, Neil Van Leeuwen, Jake Quilty-Dunn, Robert Briscoe, 
Dominic Gregory, Amy Kind, Peter Langland-Hassan, Santiago Echeverri, 
Geraldo Viera, Alma Barner, Adam Bradley, Bras Saad, Laura Silva, Jason 
Leddington, Nicolas Porot, Peter Fazekas, Carlota Serrahima, Brandon Ashby, 
Sarah Arnaud, Kris Goffin, Francesco Marchi, Stephen Gadsby, Alex Kerr, Oli 
Odoffin, Amanda Evans, Anna Ichino, Alex Geddes, Lu Teng, Manolo 
Martinez, Denis Buehler, Anya Farennikova, Jonathan Cohen, Mohan 
Matthen, Thomas Raleigh, Kevin Lande, Nick Wiltsher, Chris McCarroll, 
Dan Williams, Margot Strohminger, Craig French, Maarten Steenhagen, 
Patrick Butlin, Jacob Berger, Dan Cavedon-Taylor, Chiara Brozzo, Laura Gow, 
Andrea Blomkvist, Julian Bacharach, Jeremy Pober, Andrea Rivadulla, Grace 
Helton, as well as an anonymous referee for Oxford University Press. 

Finally, I am grateful to the team at Oxford University Press, especially 
Peter Momtschiloff, Jo Spillane, Jen Hinchliffe, and to Andrea Blomkvist for 
supplying the index. The work on this book was supported by the ERC 
Consolidator grant [726251], the FWF-FWO grant [GOE0218N], the FNS- 
FWO grant [G025222N] and the FWO research grant [GOC7416N]. 
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PART I 
MENTAL IMAGERY 


l 
Mental Imagery in Psychology 


and Neuroscience 


Close your eyes and visualize an apple. Got it? This is an example of mental 
imagery. It would be tempting to use this example to anchor the reader’s 
intuitive understanding of mental imagery instead of spending some time 
trying to figure out how we should understand mental imagery. William 
James famously started his discussion of attention with the sentence: “Everyone 
knows what attention is” (James 1890, p. 403). So it would be tempting to start 
this book by saying that “Everyone knows what mental imagery is” and just 
leave it at that. 

This strategy would be extremely problematic for a number of reasons. The 
most important of these is that there are very significant and well-documented 
differences between individuals when it comes to mental imagery: some peo- 
ple do not experience mental imagery at all—I will say a lot more about the 
philosophical significance of this in Chapter 3. Some others have very vivid 
mental imagery. So trusting that we all know intuitively what mental imagery 
is, is just lazy. 

“Mental imagery” is a technical term. The concept of mental imagery was first 
consistently used in the, then very new, discipline of empirical psychology at the 
end of the nineteenth century by psychologists like Francis Galton, Wilhelm 
Wundt, or Edward Titchener (Galton 1880; Titchener 1909; Wundt 1912). 

Technical terms are supposed to be used in a way that maximizes theoreti- 
cal usefulness. In this case, theoretical usefulness means that we should use 
“mental imagery” in a way that would help us to explain how the mind works. 
My aim is to use a vast amount of empirical data in order to understand age- 
old and deep philosophical issues. But in order to do so, we need to start with 
the empirical sciences. So how do the empirical sciences of the mind use the 
concept of mental imagery? 

My starting point is the definition used in a review article on mental 
imagery in the leading journal Trends in Cognitive Sciences: “We use the term 
‘mental imagery’ to refer to representations [...] of sensory information 
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without a direct external stimulus” (Pearson et al. 2015, p. 590).’ This 
definition aims to capture how the concept of mental imagery is used in a 
number of empirical disciplines, including psychiatry, neuroscience, and psy- 
chology. It is also consistent with some famous characterizations of mental 
imagery; for example, the one by the all-star team of Stephen Kosslyn, 
Marlene Behrmann, and Marc Jeannerod, who write that: 


Visual mental imagery is “seeing” in the absence of the appropriate immedi- 
ate sensory input, auditory mental imagery is “hearing” in the absence of the 
immediate sensory input, and so on. Imagery is distinct from perception, 
which is the registration of physically present stimuli. 

(Kosslyn et al. 1995a, p. 1335) 


It is not entirely clear what those scare-quotes mean around the word “seeing” 
and “hearing” (we'll come to that). But one thing to note already is that mental 
imagery, in spite of the connotations of the word “image,” is not necessarily 
visual: mental imagery can be auditory, olfactory, and tactile as well: it can 
happen in all sense modalities, not just in vision. 

Another influential, albeit somewhat picturesque, characterization of men- 
tal imagery comes from the pioneer of mental imagery research, Roger 
Shepard: 


The relation of a mental image to its corresponding object is in some ways 
analogous to the relation of a lock to a key. [...] the lock can be externally 
operated only by its corresponding key [...] It may also be possible to oper- 
ate the lock, at least partially, by direct manipulation of its mechanism from 
the inside, in the absence of its external key. (Shepard 1978, p. 130) 


I am not entirely sure how much experience Shepard had with locks (as pick- 
ing a lock would also involve external intervention), but the spirit of this 
characterization is clear: mental imagery lacks external input. I will follow the 
usage of the concept of mental imagery in the empirical sciences and define 
mental imagery as perceptual representation that is not directly triggered by 
sensory input.” This definition needs some unpacking. 


1 The part of the definition I edited out is “and the accompanying experience.’ I will come back to 
why this is a fair omission (and also in line with the consensus in psychology and neuroscience) in 
Chapter 4. 

? I should add that there is no crystal-clear consensus in the empirical literature on the exact defi- 
nition of mental imagery. However, the Pearson et al. definition that I appropriate (with some minor 
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By sensory input or sensory stimulation I mean the activation of the sense 
organ by external stimuli Sensory stimulation is an event. So, in the visual 
sense modality, sensory stimulation amounts to the light hitting the receptors 
in the retina. Some perceptual processing starts with sensory stimulation. 
But not all. Some perceptual processing is not directly triggered by sensory 
stimulation. 

By triggering a perceptual representation, I mean a simple causal process: 
our perceptual system can get activated by the event of sensory stimulation 
(for example, by the light hitting our retina). But it can also get activated in 
the absence of sensory stimulation. It needs to be emphasized that even on 
those occasions when it is sensory stimulation that directly triggers perceptual 
representations, the sensory input gets elaborated and embellished in the 
course of this perceptual processing—but what triggers this processing and 
what gets elaborated and embellished is the sensory stimulation. In the case 
of mental imagery, this perceptual processing (already very early cortical 
processing, see, for example, Slotnick et al. 2005) is not directly triggered by 
sensory stimulation, so whatever gets elaborated and embellished is not the 
sensory stimulation. 

By perceptual representation, I mean a representation in the perceptual 
system. Some of our perceptual representations are triggered directly by 
sensory input—this amounts to perception, or, as I will say, to “sensory 
stimulation-driven perception.” And some others are not triggered directly by 
sensory input—this amounts to mental imagery. Note that I am making a dis- 
tinction between perception and perceptual representation. Perceptual repre- 
sentation is representation involved in perceptual processing, which is just 
processing in the perceptual system. Perceptual representations get activated 
when we perceive. But they also get activated when we have mental imagery. 
Perceptual representation that is triggered directly by sensory input is sensory 
stimulation-driven perception. Perceptual representation that is not triggered 
directly by sensory input is mental imagery. 

I characterized, and will continue to characterize, mental imagery as per- 
ceptual representation not directly triggered by sensory input. But some may 
complain that, strictly speaking, representations are not the kind of things 
that get to be triggered by sensory input. Perceptual processing is the kind of 
thing that may or may not be triggered by sensory input. And perceptual 


changes) is as representative as it gets. Some empirical researchers work with a narrower conception 
(where some additional constraint of being top-down generated or being conscious is added). I will 
address the plausibility of such constraints in Chapters 3, 4, and 5. 
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representation is the representation used in perceptual processing. I am 
perfectly happy with this way of formulating mental imagery as perceptual 
representation that is used in perceptual processing that is not directly trig- 
gered by sensory input. But for the sake of simplicity, I will continue to talk 
about perceptual representations not directly triggered by sensory input. 
I also think that my characterization of mental imagery is equivalent to saying 
that mental imagery is perceptual processing not directly triggered by sensory 
input, as long as we take this perceptual processing to be representational 
processing. I will use these two ways of characterizing mental imagery (per- 
ceptual representation not directly triggered by sensory input and perceptual 
processing not directly triggered by sensory input) interchangeably, as the 
perceptual representation in question is the representation used in perceptual 
processing and as the perceptual processing is representational. Either way, 
these perceptual representations that are not directly triggered by sensory 
input are bona fide representations that higher-level processes (perceptual 
and non-perceptual ones) can use and process further. I will say more about 
the kind of representations that constitute mental imagery in Chapter 6. 

But defining perceptual representation in terms of perceptual processing, 
and defining perceptual processing in terms of the perceptual system, leaves 
open how we should think about the perceptual system. Some parts of the 
processing of sensory stimulation are more clearly perceptual than others. 
Take the visual sense modality as an example—I will use mainly visual exam- 
ples in the next couple of chapters before turning to the importance of the 
complex interaction between the sense modalities. In humans and nonhuman 
primates, the main visual pathway connects neural networks in the retina to 
the primary visual cortex (V1) via the lateral geniculate nucleus (LGN) in the 
thalamus; outputs from V1 activate other parts of the visual cortex and are 
also fed forward to a range of extrastriate areas (like the secondary visual cor- 
tex (V2), V4/V8, V5/middle temporal area (MT), and so on) (Bullier 2004; 
Grill-Spector and Malach 2004; Van Essen 2004; Katzner and Weigelt 2013). 

While there may be some debates about whether some later stages of this 
line of processing would count as perceptual, we can safely assume that early 
cortical processing (that is, the earliest stages of processing following the 
input) counts as perceptual processing.’ Throughout the book, I will take 
early cortical processing as sufficient for perceptual processing and most of 


* Tt is important to emphasize that perceptual processing here is understood functionally and not 
neuroanatomically. Given the enormous neural plasticity of the mind, activation of the primary visual 
cortex is neither necessary nor sufficient for visual imagery (see, for example, Bridge et al. 2012). Even 
if there is no activation in the primary visual cortex, but there is activation of some other early visual 
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the examples I will talk about involve early cortical processing. But I want to 
leave open the possibility that non-early perceptual processing that is not 
directly triggered by sensory input would also count as mental imagery. 

Finally, the concept of directness in my definition of mental imagery may 
need some further clarification (and the same goes for the concept of “appro- 
priate immediate sensory input” (Kosslyn et al. 1995a, p. 1335; see also 
Shepard and Metzler 1971) that has also been used to specify what mental 
imagery lacks). The perceptual processing is triggered directly by sensory 
input if it is triggered without the mediation of representations involved in 
some top-down or lateral (perceptual or extra-perceptual) processes. 

If the perceptual representation is triggered by something non-perceptual 
(as in the case of closing our eyes and visualizing), it is not triggered directly 
by the sensory input. 

If the perceptual representation in the visual sense modality is triggered by 
sensory input in the auditory sense modality (as in the case of the involuntary 
visual imagery of your face when I hear your voice on the phone with my eyes 
closed), the visual processing is triggered indirectly. It is triggered with the 
mediation of some kind of auditory representation—I will call this form of 
mental imagery “multimodal mental imagery,’ see Chapter 13. A direct trig- 
ger here would be visual input, but there is no visual input in this case. The 
auditory input leads to an auditory representation and this auditory represen- 
tation laterally triggers the visual representation. This process is mediated 
by the auditory representation. Hence, it is not direct. It counts as mental 
imagery. 

And if the visual representation of the center of the visual field is triggered 
by input in the periphery of the visual field (say, because the center of the 
visual field is occluded by an empty white piece of paper), then the visual 
processing at the center of the visual field is, again, triggered indirectly, that is, 
in a way mediated by the visual representation in the periphery (see 
Chapter 8). A direct trigger would have to be sensory input at the center of 
the visual field, but there is no such direct trigger in this case. The visual input 
in the periphery leads to the visual representation of the contours in the 
periphery. These visual representations trigger, laterally, the visual representa- 
tion of the contours at the middle of the visual field. This process is mediated 
by the visual representation in the periphery. Hence, it is not direct. It counts 
as mental imagery. 


cortical areas (for example, of MT/V5+ in the case of imagery of a moving object, we can still con- 
clude that there is visual imagery; see Kaas et al. 2010). See Chapter 14 for further details on this. 
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All these three different examples of perceptual processing count as mental 
imagery as the perceptual processing is not triggered directly by the sensory 
input. Thus, all of these count as mental imagery. 

The directness or indirectness of the causal link between sensory input and 
perceptual processing plays a crucial role in the definition of mental imagery, 
so here is a quick and dirty way of keeping direct and indirect causal links 
apart in this context. In all sense modalities, we have a fairly clear idea of the 
hierarchy of processing. We have seen that, for example in the visual sense 
modality, information from the retina is processed in the lateral geniculate 
nucleus, then in the primary visual cortex (V1), the secondary visual cortex 
(V2), and then V4/V8, V5/MT. If the processing of a feature in, say, the pri- 
mary visual cortex is triggered in an entirely bottom-up manner, this is not 
mental imagery. If it is triggered in a top-down manner or laterally, it is men- 
tal imagery. Lateral triggering can happen from a different sense modality 
(say, audition) or from the same sense modality (when the V1 processing of a 
feature at the middle of the visual field is triggered by the V1 processing of 
other features on the left- and right-hand side of the visual field). It is impor- 
tant that while we get mental imagery when the perceptual processing is 
triggered laterally this way, we do not get mental imagery if the perceptual 
processing of the input is merely influenced or modified laterally as in this 
case, the perceptual processing is still directly triggered by the sensory input. 

We can also get mixed perception/imagery cases. For example, if seeing 
the purple paper makes me visualize a purple dinosaur, this perceptual repre- 
sentation may be directly triggered by the sensory input with respect to color, 
but not in any other respects. In this case, we see the color purple, but have 
mental imagery of the dinosaur. As we shall see in Chapter 9, these mixed 
cases are very important in everyday perception. 

In some of my earlier writings on mental imagery, I characterized mental 
imagery as perceptual processing not triggered by corresponding sensory 
input. While there is an obvious difference between these two ways of think- 
ing about mental imagery (in one it is the lack of directness, in the other it is 
the lack of correspondence that sets mental imagery apart), I take the two 
definitions to be co-extensive for the vast majority of cases. The official defini- 
tion of perceptual representation not directly triggered by sensory input is 
more general and that is why I am sticking with it, but thinking of mental 
imagery as perceptual representation that lacks correspondence with the 
sensory input can be helpful in some contexts. 

In the case of shape perception, for example, correspondence is easy enough 
to measure, given the retinotopy of the primary visual cortex. Correspondence 
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here is simple spatial correspondence. The sensory stimulation is a fairly 
straightforward event: light hitting my retina in a certain pattern. And what is 
supposed to correspond to (or fail to correspond to) this pattern of sensory 
stimulation is the patterns in early cortical perceptual processing. In the visual 
sense modality, this would be the retinotopic primary visual cortex. The 
primary visual cortex (and also many other parts of the visual cortex; see 
Grill-Spector and Malach 2004 for a summary) is organized in a way that is 
structurally homomorphic to the retina—it is retinotopic. If you are looking 
at a triangle, there is roughly a triangle-patterned activation of direction- 
sensitive neurons in your primary visual cortex. So we can assess in a simple 
and straightforward manner whether the retinotopic perceptual processing 
in the primary visual cortex corresponds to the activations of the retinal cells. 
In the case of mental imagery, we get no such correspondence. 

The retinotopy of the early visual cortices (and their equivalent in the other 
sense modalities, see, for example, Talavage et al. 2004) makes spatial corre- 
spondence an extremely convenient way of gaining evidence about whether a 
given perceptual representation is mental imagery or not. For many kinds of 
stimuli (for example, shape), if there is no spatial correspondence between 
the processing in the visual cortices and the input, then the former could not 
have been triggered directly by the latter. Direct triggering would mean ret- 
inotopic triggering and this would guarantee spatial correspondence. But, of 
course, we can get such correspondence even if the processing in the visual 
cortices is not triggered by the input directly (for example, if there is a corre- 
spondence by accident without any causal link between the corresponding 
features). And with stimuli other than very simple shapes, the concept of cor- 
respondence is not entirely clear (or not easily measurable). 

For all these reasons, I use the presence or absence of a direct causal link 
between input and perceptual processing as the mark of mental imagery, 
while acknowledging that the presence or lack of correspondence can be used 
diagnostically. The same goes for temporal correspondence (again, something 
easy to measure by assessing whether the activation of the early cortices 
follows the sensory stimulation quickly enough), a topic I will come back to 
in Chapter 12. 

The definition of mental imagery as perceptual processing that is not 
directly triggered by sensory input is noncommittal about a number of differ- 
ent points. It is silent on whether this perceptual processing is conscious 
or not, voluntary or not, and so on—I will return to these distinctions in 
Chapters 3 and 4. 
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But I want to emphasize, even here, a crucial point that this definition of 
mental imagery is neutral about. The definition of mental imagery as percep- 
tual processing that is not directly triggered by sensory input is an entirely 
negative definition. It does not tell us what it is that the perceptual processing 
of a certain feature is directly triggered by. If it is triggered, albeit indirectly, by 
sensory input in the same sense modality, this still counts as mental imagery, 
because the direct causal link is missing (mental imagery of this kind will play 
an important role in Chapter 8). If it is triggered laterally, by another sense 
modality, it counts as multimodal mental imagery (see Chapter 13). Finally, if 
the perceptual processing is triggered directly by higher-level mechanisms 
(not necessarily cognitive processes or beliefs or knowledge, but mechanisms 
that are higher up than early perceptual processing in the perceptual hierar- 
chy, see Chapter 11 for details), we also get mental imagery (this is what hap- 
pens when you close your eyes and visualize an apple). 

One helpful metaphor used by neuroscientists is that of an “active black- 
board” (Girard et al. 2001; Bullier 2001, 2004; Sterzer et al. 2006; Roelfsema 
and de Lange 2016). The general idea is that the early visual cortices (and 
especially V1) function as a blackboard. Various processes can write on this 
blackboard. Sensory—that is, retinal—stimulation automatically leaves traces 
on this blackboard, and does so in a retinotopic manner. To simplify a bit, 
what is on the retina is copied onto the blackboard. 

When the retina is copied onto the blackboard, it is sensory stimulation- 
driven perception. When any other mental process draws on the blackboard, 
it is mental imagery. Some of these mental processes are determined by the 
sensory stimulation that shows up on other parts of the blackboard. Some are 
determined by perceptual processing in a different sense modality. And in 
some other cases the drawing is done by mechanisms further up in visual 
processing (Mechelli et al. 2004; Dentico et al. 2014). 

This is a very helpful metaphor, but it should be added that this is only 
one aspect of what happens in visual cortical areas (see Chapter 5 for more 
nuances about the layers of the early visual cortices on this). Nonetheless, the 
psychological/neuroscientific definition of mental imagery as perceptual pro- 
cessing that is not directly triggered by sensory input can be understood as 
those processes that add to the already existing (retinally drawn) drawing on 
the blackboard of early cortical areas. 

The question I will address in the next chapter is what this concept has to 
do with our everyday concept of mental imagery (if there is such a think). 


2 
Mental Imagery in Philosophy 


The concept of mental imagery is not devoid of colorful connotations: it 
brings to mind imagination, little images, pictures appearing before the 
“mind’s eyes,” and so on. This is especially confusing as mental imagery is a 
technical term and, as we have seen, we should use it in a way that is maxi- 
mally theoretically useful. Hence, in order to arrive at a workable conception 
of mental imagery, we should carefully remove these unwanted connotations 
of the concept—a task not entirely trivial, given that throughout the history of 
philosophy people have often used terms like “imagination,” “visualizing; 
“seeing in the mind’s eye,” or “images” interchangeably. 

It is not my aim here to give a comprehensive history of the concept of men- 
tal imagery in the history of philosophy—I’m sure this would be an interesting 
project, but it is not my project. It would be an interesting project because the 
technical term of mental imagery was not systematically used until the end of 
the nineteenth century, and throughout the history of philosophy people often 
used the term “imagination” to refer to what we now would describe as mental 
imagery. Thomas Hobbes, for example, talked about “retaining an image of the 
seen thing; which comes very close to at least a subcategory of the current use 
of mental imagery in psychology and neuroscience, but he referred to this men- 
tal phenomenon as imagination (Hobbes 1651, chapter 2). More generally, both 
the (British) empiricists and the (German) idealists used the term “imagina- 
tion” at least sometimes in the sense that would be captured by the concept of 
mental imagery nowadays (see Yolton 1996 for a summary). If we want to 
understand the evolution of philosophical thinking about mental imagery, we 
would need to go through all the historical texts about imagination and sepa- 
rate out references to voluntary acts (imagination proper) from references to 
perceptual representations that are not triggered directly by sensory input 
(mental imagery). Again, this is not something I intend to do here. 

Mental imagery has been surprisingly ostracized in the last few decades of 
philosophical thinking about the mind." It is not clear why this is—maybe it 


! There are, of course, important exceptions (Richardson 1969; Currie 1995a; Kind 2001). A less- 
obvious one is David Kaplan's use of “images” in his way of characterizing “vivid names, a crucial 
aspect of understanding propositional representation (see Kaplan 1968, esp. p. 411). 
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is one of the side-effects of the “linguistic turn” of analytic philosophy. If we 
are trying to understand how the mind works by focusing on language, then 
mental imagery (a non-linguistic mental representation par excellence) is 
likely to fall by the wayside. 

I want to focus on two approaches that were both highly influential and 
also particularly helpful for showing that the philosophical concept of mental 
imagery is not too far removed from the one I zeroed in on in Chapter 1. 

Here is a famous and classic definition from more than half a century ago: 


Mental imagery refers to all those quasi-sensory or quasi-perceptual experi- 
ences [...] which exist for us in the absence of those stimulus conditions that 
are known to produce their genuine sensory or perceptual counterparts, and 
which may be expected to have different consequences from their sensory or 
perceptual counterparts. (Richardson 1969, pp. 2-3) 


Although I'm not sure that what Richardson means by stimulus condition is 
the same as what I mean by sensory stimulation (the former seems to be a 
state and something external, the latter is an event and happens to our sense 
organ), it seems that the general gist of Richardson's definition is not that far 
away from the way psychologists and neuroscientists think about mental 
imagery—with one exception. Richardson talks about “quasi-sensory or 
quasi-perceptual experiences.” 

So mental imagery, according to Richardson, is by definition conscious (as 
experiences are supposed to be conscious mental states). And many other 
philosophers agree (see, for example, Thompson 2007). Here is Peter Kung’s 
definition, which is just one example of what I take to be the standard way of 
characterizing mental imagery in philosophy: “a sight or ‘picture’ in your 
mind’s eye, a sound in your mind’s ear” (Kung 2010, p. 622). This way of 
thinking about mental imagery seems to imply that mental imagery is neces- 
sarily conscious. 

In contrast, if we understand mental imagery the way it is used in psychol- 
ogy and neuroscience, there is no such restriction. If you have perceptual 
processing that is not directly triggered by sensory input, you have mental 
imagery, regardless of whether this perceptual processing is conscious. I will 
say (much) more on unconscious mental imagery in Chapter 4. 

Further, it is notoriously unclear how one should interpret the phrase 
“quasi-perceptual” in this context. Gregory Currie’s definition is a helpful 
development in this respect: 
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Episodes of mental imagery are occasions on which the visual system is 
driven off-line, disconnected from its normal sensory inputs and experien- 
tial outputs. (Currie 1995a, p. 26) 


Currie cashes out the “quasi-perceptual” nature of mental imagery in terms of 
a common mechanism: visual imagery and visual perception both use the 
visual system, but the latter does so “on-line,” whereas the former does so “off- 
line” While “on-line” and “off-line” are metaphors, if we try to substantiate 
them, what we get is something very similar to the psychological/neuroscien- 
tific definition, as a straightforward way of understanding the difference 
between “on-line” and “off-line” perceptual processing is that the former, but 
not the latter is directly triggered by sensory input. As Currie says, this differ- 
ence is whether the perceptual processing is “disconnected from its normal 
sensory inputs.” 

For Currie, mental imagery is the functioning of our visual system discon- 
nected from its normal sensory inputs. For psychologists and neuroscientists, 
mental imagery is perceptual processing not directly triggered by sensory 
input. Substitute “disconnected from” for “not triggered by” and “normal” for 
“direct” and it may seem that we get almost the same definition. 

On the other hand, Currie (1995a) makes it clear (although not in the defi- 
nition I quoted above) that he takes mental imagery to be a kind of 
experience—that is, as something necessarily conscious, as does Richardson 
(and, as we have seen, also some other philosophers). 

This emphasis on conscious mental imagery may remind one of the way 
the concept of mental imagery was used at the time when it was first intro- 
duced at the end of the nineteenth century. At that time, psychologists like 
Francis Galton, Wilhelm Wundt, or Edward Titchener (Galton 1880; 
Titchener 1909; Wundt 1912) thought of mental imagery as a mental phe- 
nomenon characterized by its phenomenology—a quasi-perceptual episode 
with a certain specific phenomenal feel. This stance led to serious suspicion, 
and often the outright rejection, of this concept in the following decades 
when behaviorism dominated the psychological discourse (Kulpe 1895; Ryle 
1949; Dennett 1969). It was not until the 1970s that mental imagery was again 
considered to be a respectable concept to study in the empirical sciences of 
the mind—by cutting the ties with the conscious phenomenology of imagery. 
This opened the door to the possibility of unconscious mental imagery and, 
as we shall see in Chapter 4, we have strong empirical and conceptual reasons 
to maintain that mental imagery can indeed be unconscious. 
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Mental imagery is a psychological phenomenon. And, as is normally the 
case with psychological phenomena, introspecting is not a very reliable guide 
to them. As we have seen at the beginning of Chapter 1, it is tempting to take 
mental imagery to be something we all know about—all we need to do is close 
our eyes and visualize an apple and then introspect what is going on in “our 
mind's eye” And much of the philosophical research (and much early psycho- 
logical research) on mental imagery followed this methodology. 

I don't want to do that. I think closing our eyes and introspecting will give a 
very limited insight into what mental imagery is and how it works. So I want 
to move away from the introspective concept of mental imagery to the psy- 
chological concept of mental imagery, which would be something we can 
characterize in a way that does not rely on our introspection or our experi- 
ence in general. And the definition that psychologists and neuroscientists use 
does exactly this: it refers to perceptual processing and sensory stimulation 
and the relation between the two. 

I should add that when we move away from our introspective concept of 
mental imagery towards a psychological concept, we should not throw the 
original introspective concept out of the window. Introspection is often unre- 
liable, but it is also what makes us care about the phenomenon in question 
(mental imagery in this case) to begin with. And all things considered, it 
would be preferable to have a psychological concept of mental imagery that is 
not in conflict with the introspective concept of mental imagery. 

I defined mental imagery as perceptual representation not directly trig- 
gered by the sensory input. Some attractive features of this definition need to 
be pointed out. First of all, psychology and neuroscience have a lot to say 
about mental imagery. If we philosophers want to be able to communicate 
about these issues with empirical scientists, it is a good idea to use their ter- 
minology. Understanding mental imagery the way psychologists understand 
mental imagery can give us a lot of ammunition in our philosophical argu- 
ments about mental imagery—in fact, the aim of this book is to do just that. 

Second, according to this definition, mental imagery does not have any- 
thing to do with looking at tiny pictures in our mind (an idea behaviorists, 
and especially Gilbert Ryle, were making fun of; see Ryle 1949, chapter 8). 
Mental imagery is not something we see: it is a certain kind of perceptual 
processing. So it is in no way more mysterious than other kinds of perceptual 
processing (like perception proper). Nor do we need to postulate any onto- 
logically extravagant entities (like tiny pictures in our head) to talk about 
mental imagery any more than we need to postulate these entities in order to 
talk about perception. 
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Third, mental imagery is much richer than the sketch we would draw if we 
had to draw the image that we see in the mind’s eye. One argument about the 
relation between mental imagery and imagination, which I will return to in 
Chapter 22, is about how different imaginative episodes may use the very 
same mental imagery. Here is one often-cited example (originally from 
Peacocke 1985, pp. 19-20; see also Kung 2010, esp. p. 626): Visualizing a suit- 
case and visualizing a cat hiding behind a suitcase brings up the very same 
mental imagery: that of a suitcase.” But what we imagine is very different. 
This is not a very helpful way of thinking about mental imagery, partly 
because there are enormous interpersonal variations about whether these two 
imaginative episodes in fact bring up the same conscious mental picture, and 
partly because it conflicts with what we know about the neuroscience of men- 
tal imagery. We have strong empirical reasons to think that the perceptual 
processing in the early visual cortical areas, when we visualize a suitcase and 
when we visualize a cat behind a suitcase, are very different (Kosslyn et al. 
1995a; O’Craven and Kanwisher 2000). From the point of view of this defini- 
tion of mental imagery, it is irrelevant that the philosopher conjures up the 
same image consciously. It gives us no reason whatsoever to conclude that the 
mental imagery is the same. 

Finally, this way of thinking about mental imagery makes it possible to talk 
about the mental imagery of a subject without having to rely on her (notori- 
ously unreliable) introspective reports. Suppose that I put you in an fMRI 
scanner and map out your early visual cortices. I then detect the direction- 
sensitive neurons firing in a triangle shape in your primary visual cortex, but 
your retina has nothing on it as your eyes are closed, or maybe there are just 
some parts of a triangle on your retina, as in the case of looking at the Kanizsa 
triangle (see Figure 1). In this case, I can conclude that you have mental 
imagery of a triangle. It does not matter what you are introspecting. This gives 
us a concept of mental imagery that picks out a publicly observable phenome- 
non. It is as scientifically solid a concept as it gets. 

Nonetheless, and this is the fourth attractive feature of my definition, the 
psychological/neuroscientific definition of mental imagery is not completely 
unrelated to the philosophical (and introspective) one. As I will argue in the 
next chapter, it is continuous with, and could be considered to be a straight- 
forward extension of, our everyday (and philosophical) conception of mental 
imagery. Just as the concept of time in the theory of special relativity is an 
extension of our everyday conception of time (in the sense that our everyday 


> See Wiltsher (2016, pp. 267-8) for some dissent on this. 
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Figure 1 Kanizsa triangle 


conception of time could be thought of as a special case of the concept of time 
in the theory of special relativity), the same is true of mental imagery: our 
everyday conception of mental imagery could be thought of as a special case 
of the concept of mental imagery outlined here. 


3 


Varieties of Mental Imagery 


Lets go back to the example of closing your eyes and visualizing an apple. 
This is undoubtedly one way of exercising our mental imagery: one that many 
philosophers and non-philosophers consider the standard and stereotypical 
way of having mental imagery. But it is not at all representative, in at least six 
respects. 

First, it is visual mental imagery. And vision is not the only sense modality. 
So if we can perceive auditorily, olfactorily, and so on, we can also have audi- 
tory, olfactory, tactile, etc. mental imagery. I call all these “mental imagery”— 
it should be clear that the word “imagery” does not here denote anything that 
has to do with pictures (which would usually be something visual): mental 
imagery exists in all sense modalities. 

Second, when I ask you to visualize an apple, it is something you do volun- 
tarily and intentionally. But mental imagery does not have to be voluntary 
(Pearson and Westbrook 2015). One can have flashbacks of some unpleasant 
scene—this is also mental imagery, but it is not voluntary mental imagery. 
And some of our mental imagery is of this involuntary kind—this is espe- 
cially clear in the auditory sense modality, as demonstrated by the phenome- 
non of earworms: tunes that pop into our heads and that we keep on having 
auditory imagery of, even though we do not want to. As Darwin says in 
Descent of Man (where it is clear from the context that he is talking about 
what I call mental imagery), “The Imagination [can] unite former images and 
ideas, independently of the will” (Darwin 1871, chapter 3). 

Third, when you visualize an apple, you tend to do so in a detached nonac- 
tual visualized space: you close your eyes and visualize an apple in this nonac- 
tual space that has nothing to do with the space you occupy.’ But this is not 
necessarily so. One can also visualize the apple in one’s egocentric space, for 
example, in one’s hand or next to one’s laptop. Mental imagery can localize the 
imagined object in one’s egocentric space or in some detached nonactual 


1 As Wittgenstein says, “what is imaged is not in the same space as what is seen” (Zettel, (Oxford: 
Blackwell, 1967, 628, see also Chomanski 2018 for a good discussion of the relation between perceived 
space and the space of mental imagery). 
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space. In fact, having mental imagery of something in our egocentric space is 
not something unusual—we use mental imagery this way very often. When 
you are looking at your empty living room, thinking about what kind of fur- 
niture to buy, you're likely to try to form mental imagery of, say, a sofa not in 
a detached space “in the mind’s eye,’ but in your living room. And when 
you're trying to figure out whether this sofa would fit through the main 
entrance, again, you are having mental imagery of the sofa in the very con- 
crete space of the main entrance of your house (see Briscoe 2018 and many of 
the chapters in Kind and Kung 2016). 

Fourth, visualizing an apple is not normally accompanied by any feeling of 
presence. You are not fooled by this mental imagery into thinking that there is 
actually an apple in front of you so that you could reach out and grab it. But, 
again, this is not a necessary feature of mental imagery. There is no prima 
facie reason why mental imagery could not be accompanied by the feeling of 
presence (Simons et al. 2017). In fact, lucid dreaming (extremely vivid dreams 
where we seem to be able to control the dream content), which is widely con- 
sidered to be a form of mental imagery (see Hobbes 1655; Walton 1990 for a 
summary), is very much accompanied by the feeling of presence. And hallu- 
cination, which is, arguably, also a form of mental imagery, is also clearly 
accompanied by the feeling of presence (see Nanay 2016a for more discussion 
of hallucination as mental imagery). 

The fifth distinction is about how mental imagery is triggered. As we have 
seen in Chapter 1, our definition of mental imagery was a negative one: per- 
ceptual processing that is not directly triggered by sensory input. This defini- 
tion tells us what perceptual processing that constitutes mental imagery is not 
triggered by, but it is silent about what it is triggered by. 

When you close your eyes and visualize an apple, this is top-down trig- 
gered mental imagery. The imagery is triggered by higher-order mental states. 
No bottom-up trigger (as your eyes are closed) and no lateral trigger either 
(assuming that you have sound canceling earphones on and you get no other 
sensory stimulation from other sense modalities either). 

But this is not the only way in which mental imagery can be triggered. It 
can also be triggered by the sensory input in the very same sense modal- 
ity—as long as this sensory input triggers the perceptual processing indirectly. 
I will argue in Chapter 8 that this is exactly what happens in many familiar 
cases of amodal completion: visual mental imagery is triggered by retinal sen- 
sory stimulation around the amodally completed region, which means that 
perceptual processing in this amodally completed region is not directly trig- 
gered by the retinal sensory input. 
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Finally, mental imagery can also be triggered laterally, by perceptual pro- 
cessing in another sense modality. For example, visual mental imagery can be 
triggered in the absence of any visual sensory stimulation and also in the 
absence of any top-down influences by auditory perceptual processing. I will 
go through a number of examples of laterally triggered multimodal mental 
imagery in Chapter 13. 

I made five distinctions within the category of mental imagery: (a) visual 
versus auditory versus olfactory versus gustatory, etc. (b) voluntary versus 
involuntary, (c) egocentric space versus non-egocentric space, (d) feeling of 
presence or no feeling of presence and (e) top-down versus lateral. We can add 
more distinctions that will be important in the discussion to follow, for example 
one between determinate and determinable imagery (as mental imagery is very 
often not at all determinate, see Chapter 10). But there is yet another distinc- 
tion, which is much more controversial than the ones I have discussed in this 
chapter. It is between conscious and unconscious mental imagery and I will 
spend the entirety of Chapter 4 giving empirical arguments in favor of uncon- 
scious mental imagery. I will get into these debates soon. But for the purposes 
of this chapter, I will assume a much weaker and completely uncontroversial 
claim only, namely, that mental imagery may be more or less vivid (Kind 2017). 

There is a recent body of research on subjects who report not having 
any conscious mental imagery whatsoever. This condition is called aphantasia 
(Zeman et al. 2007, 2010, 2015; Dawes 2020; Blomkvist forthcoming). A sur- 
prisingly large proportion of the population (according to some measures 5-8 
percent) have this condition: they lack conscious mental imagery: when they 
close their eyes and try to visualize an apple, no image is conjured up (aphan- 
tasia is identified in terms of self-report—as we shall see in Chapter 4, it 
covers a diverse set of underlying phenomena). 

Aphantasia is one end of the spectrum. On the other end of the spectrum 
we find people with extremely vivid imagery experiences—a condition often 
called hyperphantasia. Most of us are somewhere in between. And we know a 
fair amount about the neuroscience of what the vividness and precision of 
mental imagery depend on. There is a linear correlation between the vividness 
of mental imagery and some straightforward (and very easily measurable) 
physiological features of the subject’s brain (such as the size of the subject's 
primary visual cortex and the relation between early cortical activities and the 
activities in the entire brain; see Cui et al. 2007; Bergmann et al. 2016). 

But these findings about aphantasia and hyperphantasia and the interper- 
sonal variations in the vividness of mental imagery should also give us some 
reasons to be suspicious about any kind of reliance on phenomenology and 
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introspection when thinking about mental imagery (see also Kozhevnikov 
et al. 2009 for some evidence on cross-cultural variations in the vividness of 
mental imagery). 

The interpersonal variations in imaginative phenomenology—the extreme 
case of which is demonstrated in the aphantasia research—highlight just how 
unlikely it is that anyone, even the most astute observer possible, could read 
off a plausible account of mental imagery from their own experiences as these 
experiences are very different from the experiences of many others. 

The same goes for other philosophical debates where mental imagery plays 
a role, like the cognitive phenomenology debate (Pitt 2004). According to the 
proponents of cognitive phenomenology, some conscious non-perceptual 
states have distinctive phenomenal character in the sense that it is “different 
from what it is like to be in any other sort of conscious mental state” (Pitt 
2004, p. 4). So the phenomenal feel that accompanies my belief is different in 
kind from the phenomenal feel that accompanies my perceptual states. Others 
argue that cognitive phenomenology is not distinctive—it really just derives 
from the phenomenology of quasi-perceptual states like mental imagery 
(both visual and auditory, see, for example, Carruthers 2005, esp. pp. 138-9). 

Many (not all) arguments on both sides are firmly grounded in introspec- 
tion: when you have a belief and you introspect, what kind of mental imagery 
(if any) is conjured up? If we take the interpersonal variations in terms of the 
vividness of mental imagery seriously, then this philosophical debate may not 
be as theoretically interesting as it may sound. People with less-vivid mental 
imagery will be likely to come down on the distinctive cognitive phenome- 
nology side as they will not be able to discern (sensory) mental imagery when 
they introspect their non-perceptual mental states and these people will not 
be drawn to explain the non-perceptual phenomenology in terms of the 
phenomenology of this sensory mental imagery. But others, with more vivid 
mental imagery, would be more likely to give an explanation of this kind.” 

The method of just introspecting and coming up with a philosophical 
account of mental imagery (and of other mental phenomena) has had a good 
run in the history of philosophy. But given the interpersonal variations in the 
phenomenology of mental imagery (and in fact, of all kinds of other mental 
phenomena), it is just not a very promising option. 


? It needs to be noted that this debate is in fact even more complex, as some of those who are 
skeptical of the idea of cognitive phenomenology would deny that occurrent beliefs have any phe- 
nomenology at all (not just that their phenomenology is distinct from perceptual phenomenology). I 
am not sure that the vividness of mental imagery has a lot to do with this aspect of the debate (but see 
Lennon forthcoming on the potential relevance of the apahantasia research here). 
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There is no carefully controlled psychological research about how philoso- 
phers’ intuitions on cognitive phenomenology vary as a result of the vividness 
of their mental imagery. But there is carefully controlled psychological 
research about how psychologists’ (and philosophers’) intuitions vary as a 
result of the vividness of their mental imagery when it comes to the so-called 
Imagery Debate of the 1980s (the debate about whether the format of mental 
imagery is imagistic or propositional; see Kosslyn 1980; Pylyshyn 1981; Tye 
1991; see also Chapters 6 and 18). A fairly large study showed that the vivid- 
ness of imagery has significant impact on theoretical commitments in this 
debate (Reisberg et al. 2003). Researchers with less-vivid mental imagery 
were more likely to take the propositional side and those with more vivid 
mental imagery tended to come down on the imagistic side. 

Note that these considerations about the limits of introspection when it 
comes to mental imagery and imaginative phenomena in general are very dif- 
ferent from the familiar line in the empirical sciences (and in philosophy of 
mind) about the unreliability of introspection in general (see Schwitzgebel 
2008; Spener and Bayne 2010). Even if we take introspection to be fully reli- 
able, what we are introspecting in the case of mental imagery is very different 
in the case of different individuals. Someone closer to the aphantasia end of 
the spectrum and someone closer to the hyperphantasia end of the spectrum 
will (reliably) introspect something very different. 

These arguments are complimented nicely by Ian Phillips’s (empirically 
based but philosophical) argument that the reason why there is a significant 
variation in people’s reports on their use of imagery is not that some of them 
use imagery and others don’t but that the imagery of some people tends to be 
conscious and the imagery of some others tends to be unconscious (Phillips 
2014; but see also Schwitzgebel 2002 and Chapter 4 for more on unconscious 
mental imagery). 

I want to go back to the many distinctions within the category of mental 
imagery and the great variety of mental imagery that these distinctions 
provide us with (TIl set aside the one about the feeling of presence for now). 
These distinctions are orthogonal to one another, so we get a lot of internal 
distinctions within this category. Mental imagery can be voluntary and vivid, 
that localizes its object in a non-egocentric space. Visualizing an apple is of 
this kind. But it can also be involuntary and not at all vivid (or maybe even 
unconscious), that localizes its object in an egocentric space (which would be 
the polar opposite of the kind of mental imagery that we have when we close 
our eyes and visualize the apple). This latter kind of imagery is what will play 
a crucial role in the next chapters. 
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One way of seeing the relation between the concept of mental imagery 
I use and the introspective concept is that we get the former from the latter 
by lifting some irrelevant restrictions—exactly those restrictions discussed in 
this chapter. So if we lift the restriction of voluntariness, no egocentricity and 
no feeling of presence (and consciousness), we end up with the psychological/ 
neuroscientific concept of mental imagery. 

Finally, it is very important to emphasize that my aim here is not to capture 
what the general public means by mental imagery. I do not want to give an 
ordinary language analysis of the term “mental imagery.’ Even fans of ordi- 
nary language analysis (I am not among them) should have strong reason not 
to do so, as “mental imagery” is not an ordinary language term. In fact, none 
of the languages I am familiar with, other than English, has a term that would 
mean mental imagery (as distinct from “imagination” or “mental picture”). 
The term “mental imagery” is a technical term. I don’t think there are any 
pre-theoretical considerations that should persuade us to use the term one 
way or another. As a result, we should use it in a way that is theoretically 
fruitful. My aim was to show that using it in a way that is consistent with 
psychological/neuroscientific consensus is the most theoretically fruitful use 
of the term. 

Having said this, if you, the reader, mean something very different by men- 
tal imagery, this is not a reason to stop reading this book. My aim is to show 
that a certain mental phenomenon, namely the perceptual representations 
that are not directly triggered by sensory input, would play an important role 
in a variety of mental processes. I call these perceptual representations mental 
imagery. 

But if you don’t want to, call it something else. Call it mental imagery*. Or 
maybe off-line perception. Or phantom perception. Or use some other label. 
This book is not about how best to label such perceptual processes. It is about 
these perceptual processes, which I happen to label mental imagery. 


4 


Unconscious Mental Imagery 


There are three kinds of reasons to think that mental imagery may be con- 
scious or unconscious: conceptual, methodological, and empirical reasons. 

First, the conceptual reason. Perception can be conscious or unconscious. 
If the stimulus is masked or presented for a very short period of time, the 
subject still perceives it, but has no conscious experience of it (Kentridge et al. 
1999; Goodale and Milner 2004; Kouider and Dehaene 2007; Weiskrantz 
2009; there is some dissent on this, see below). But if perception per se can be 
unconscious, it would be completely ad hoc to postulate that mental imagery 
cant be. Remember that mental imagery is a form of perceptual processing: 
perceptual processing that is not directly triggered by sensory input. If per- 
ceptual processing that is directly triggered by sensory input can be uncon- 
scious, it is difficult to see why perceptual processing that is not directly 
triggered by sensory input (that is, mental imagery) would have to be 
conscious. 

The second reason is methodological. Most behavioral or neuroimaging 
experiments on mental imagery—including the most famous ones—often 
don't actually take the conscious experience of the subject into consideration. 
Take, for example, the famous mental rotation tasks, one of the most widely 
used paradigms in the study of mental imagery. There is a linear correspond- 
ence between the time required for deciding whether two three-dimensional 
shapes are the same and the degree of rotation between these two shapes 
(Shepard and Metzler 1971). Your task is to decide whether two complex 
three-dimensional shapes are the same. And you are quicker to respond (with 
a yes or no answer) if the two shapes are oriented in such a way that less men- 
tal rotation is required between them. 

Whatever these experiments say about mental imagery (and we can stay 
away from this question), it must be a claim that is silent about whether men- 
tal imagery is conscious. These experiments are response time experiments 
and the reasons for inferring the exercise of mental imagery are not intro- 
spective ones, but come from the timing of the subjects’ responses, for which 
they did not have to be conscious of any kind of mental imagery (although 
they obviously needed to be conscious of the task they were performing). The 
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mental imagery involved in this task may or may not be conscious. Therefore, 
the concept of mental imagery that mental rotation experiments are con- 
cerned with shouldn’t (and can’t) have consciousness as a built-in feature.* 

The third reason is empirical: positing unconscious mental imagery can 
explain a number of empirical findings better than not positing unconscious 
mental imagery. I will give two arguments for this claim. 

The first one takes influential paradigms for studying unconscious percep- 
tion and modifies them in such a way that they are applicable to unconscious 
mental imagery. I will focus on priming studies and argue that we can use the 
same experimental paradigm to show that not only perception, but also men- 
tal imagery can be unconscious.” 

Some have recently expressed general doubt concerning the standards for 
when we can be absolutely certain that perception is unconscious (see Block 
and Philips 2017 for a summary). These skeptics would not go along with the 
claim that perception can be unconscious. These worries would be inherited 
by my arguments. If the skeptics were right (I don't think they are) that the 
priming studies fail to establish that perception can be unconscious, then my 
arguments wouldn't establish that mental imagery can be unconscious either. 
My second empirical argument from aphantasia is not susceptible to these 
worries. 

The first argument for unconscious mental imagery is from priming studies. 
An important set of findings that shows that perception can be unconscious 
involves unconscious priming: the subjects behavior is altered by the uncon- 
sciously presented stimulus. The general structure of the argument here is 
that we can infer that the subject perceived something unconsciously if (a) 
the subject has no conscious awareness of the stimulus presented perceptually 
to her and (b) this unconscious presentation of the stimulus primes her (often 
in very similar ways as conscious presentation of the stimulus does). 

There is a large number of findings that follow this general pattern when it 
comes to showing that perception can be unconscious (Kentridge et al. 1999; 
Goodale and Milner 2004; Kouider and Dehaene 2007; Weiskrantz 2009). 
But the same general argument could be modified to show that mental 


1 There are, of course, experiments that do consider the subjects’ conscious experience, but the aim 
even in these experiments is to find correlations between conscious experience and publicly observ- 
able features of the subjects’ behavior (see, for example, Cui et al. 2007; Dijkstra et al. 2017a). 

> There are additional important and influential considerations for unconscious perception, from 
unilateral neglect and from dorsal vision, both of which could be modified to show that there is 
unconscious mental imagery. I will not talk about these considerations here, but will instead focus on 
what I take to be stronger arguments for unconscious mental imagery (see Nanay 2021a for how the 
argument from unilateral neglect and the argument from dorsal vision would go). 
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imagery can be unconscious. In this case, we could infer that the subject had 
unconscious mental imagery if (a) the subject has no conscious awareness of 
her mental imagery, and (b) this unconscious mental imagery primes her in 
the same way conscious mental imagery does. 

We have seen the general complications with (a) three paragraphs ago. But 
in the case of unconscious mental imagery, (b) is also more complicated than 
it looks. In the case of unconscious perceptual priming, as long as the subject's 
behavior is altered by the unconsciously presented stimulus the same way as it 
is altered by consciously seeing the stimulus, we can conclude that it was her 
unconscious perception that primed her behavior. But in the case of mental 
imagery, it is much more difficult to find behavior that would be primed by 
mental imagery, let alone unconscious mental imagery. 

Here is one experiment that could provide all the ingredients for the kind 
of argument I outlined above (Kwok et al. 2019). It is a binocular rivalry 
experiment. In the case of binocular rivalry, when different images are pre- 
sented to each eye, our visual experience alternates between these two images. 
If an image of a cat is presented to the left eye and an image of a dog to the 
right eye, your experience is not a composite of the two, but rather a quick 
switching back and forth between a cat and a dog. 

Short-term exposure to stimuli immediately before the binocular rivalry 
task influences the pattern of these alternating experiences (as long as the 
stimuli are not too strong, which leads to suppression, see Brascamp et al. 
2007). Suppose that an image of red vertical lines is presented to the right eye 
and an image of green horizontal lines is presented to the left eye. If you have 
been staring at the red wall before this task, your right eye (where red vertical 
lines are presented) is more likely to win out in the binocular rivalry—more 
than 50 percent of the time, you will experience the red vertical lines and not 
the green horizontal ones during the binocular rivalry task. 

A relatively new set of findings shows that conscious mental imagery influ- 
ences the patterns of these alternating experiences in much the same way as 
conscious perception does (Pearson et al. 2008, 2011; Keogh and Pearson 
2011). Again, suppose that an image of red vertical lines is presented to the 
right eye and an image of green horizontal lines is presented to the left eye. If 
you visualized a red apple before the binocular rivalry phase, your right eye 
(where red vertical lines are presented) is more likely to win out in the binoc- 
ular rivalry. 

So conscious mental imagery has an impact on the binocular rivalry pat- 
tern. The question is whether unconscious mental imagery has a similar 
impact. Consider the following experiment (Kwok et al. 2019). Subjects were 
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shown a two-word description of an object on the screen for 3 seconds, which 
included either the word “red” or the word “green” (“red apple,’ “red chili?” 
“green apple,’ etc.). After this, they were instructed either to imagine or to 
avoid imagining the described object (say, the red apple). In the “avoid imag- 
ining” condition, if the subject did in fact, in spite of the instructions, imagine 
a red apple (or anything red), they had to push a button indicating this. 

After this priming phase, the subjects did the classic binocular rivalry task, 
with a red stimulus presented to one of the eyes and green to the other and 
the subjects had to report which color was dominant. A number of control 
conditions were added, the most important of which was identical to the 
“avoid imagining” condition, with the exception that during the 7 seconds 
when the subject was supposed to avoid imagining, a highly luminous (nei- 
ther green nor red but neutral yellow) stimulus was presented in the subject's 
visual field (Figure 2). 

‘The experimenters found that subjects’ binocular rivalry pattern was primed 
just as much in the “avoiding imagining” condition as in the “imagining” 
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Figure 2 From Kwok et al. (2019) 
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condition. When the subjects imagined a red apple for 7 seconds, the red 
experience won out systematically in their subsequent binocular rivalry task. 
No surprise here. What is more surprising is that even when the subjects were 
avoiding imagining a red apple (and, again, ruling out those cases when they 
failed to avoid this) for 7 seconds, the red experience still won out systemati- 
cally in their subsequent binocular rivalry task. 

This result in itself does not rule out the possibility that what primed the 
binocular rivalry pattern was not unconscious mental imagery, but rather 
some sort of higher-level representation—maybe the linguistic representation 
presented on the screen (“red apple”) or some other cognitive strategy (see 
Pearson and Keogh 2019 on the diversity of cognitive strategies in visual 
working memory tasks, for example). In order to rule out this possibility, the 
experimenters added the control condition where during the 7-second “avoid 
imagining” phase, the subjects were presented with a highly luminous (nei- 
ther green nor red but neutral yellow) stimulus, which flushes out the early 
visual cortices, interfering with mental imagery (see Sherwood and Pearson 
2010). If the priming were really due to non-sensory (say, linguistic) pro- 
cesses, then this should not make a difference. But it did. In the luminance 
condition, avoiding imagining failed to produce the same priming effect as 
avoiding imagining without the luminance manipulation did (or as straight 
imagining did). 

These results strongly indicate that it is unconscious mental imagery that 
primes the binocular rivalry pattern. Remember that subjects had to indicate 
if their attempt to avoid imagining a red apple broke down. So we know that 
those subjects whose attempt to avoid imagining a red apple did not break 
down had no awareness of any red mental imagery during the 7-second 
period. This unconscious episode nonetheless produced the same priming 
effect as the conscious one did. Finally, we know that this unconscious epi- 
sode was in fact unconscious mental imagery (and not some kind of uncon- 
scious higher-level (maybe linguistic) representation) given that sensory 
presentation of an irrelevant sensory stimulus interfered with the prim- 
ing effect. 

The authors of this study did not explicitly draw the conclusion that the 
experiment demonstrates the presence of unconscious mental imagery, but at 
least one of the authors of the study would be open to this interpretation (see 
Pearson 2019; Koenig-Robert and Pearson 2020; but see also Koenig-Robert 
and Pearson 2019). 

The second argument for unconscious mental imagery is from aphantasia. 
My previous arguments for unconscious mental imagery piggybacked on 
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argumentative strategies about unconscious perception. Skeptics about 
unconscious perception in general would be equally skeptical about these 
arguments. The argument I will give here does not rely on any arguments 
about unconscious perception. It is about aphantasics: people with aphantasia. 

As we have seen in Chapter 3, aphantasia subjects report not having any 
conscious mental imagery (Zeman et al. 2007, 2010, 2015). While it would, of 
course, be possible to argue that subjects with aphantasia are not really 
unaware of their mental imagery, this may be a more difficult (in any case, 
different) move than arguing against unconscious priming or unconscious 
perception in unilateral neglect. So this line of argument may convince some 
of the skeptics who are not convinced by any arguments concerning uncon- 
scious perception. 

A subject has aphantasia when she reports not having mental imagery— 
this is a behavioral criterion. And, as a result, aphantasia may have a number 
of diverse underlying conditions. Some subjects with aphantasia have difficul- 
ties voluntarily conjuring up mental imagery, but they do report mental 
imagery when dreaming, for example. Some others report no mental imagery 
at all—either voluntary or involuntary. 

In other words, aphantasia is not a monolithic category. Some aphantasics 
have visual flashbacks and dream vivid dreams, clearly involving conscious 
visual imagery (Zeman et al. 2020). But they have problems with the voluntary 
control of conscious visual imagery. Others have no conscious visual imagery 
at all. My claim is that, at least in some cases of aphantasia, we can explain the 
behavior of the subjects better if we postulate that they have unconscious 
mental imagery. At least some aphantasics do have mental imagery, but they 
do not have conscious mental imagery. 

One experiment (Jacobs et al. 2017) that very much supports my claim has 
a very small sample size: one. This one subject, AI (not actual initials) is a 
31-year-old female PhD student, who scored 16 points on the Vividness of 
Visual Imagery Questionnaire (that is the lowest possible score—the average 
score of the control group was 61.1 points). She reports not having any mental 
imagery whatsoever. 

The experimental design is the following (Figure 3): the subject first sees 
the name of a geometric shape (for example, “triangle” or “diamond”) for 500 
milliseconds. Then she either sees the geometric shape in question framed by 
four placeholders for 1500 milliseconds or is instructed to imagine this geo- 
metric shape within the four perceived placeholders. In the latter condition, 
only the four placeholders are shown—four dots indicating the corners of the 
square within which the geometric shape is to be imagined. This is followed 
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Figure 3 From Jacobs et al. (2017) 


by 200 milliseconds of a random noise stimulus to mask the potential 
afterimages. 

After 4 seconds of delay, a random dot is presented, and the subject has to 
decide whether it is within or without the boundaries of the perceived/imag- 
ined geometrical shape. This is followed by a confidence rating of this judg- 
ment. The only difference between the first (working memory) condition and 
the second (mental imagery) condition is that in the latter, the geometrical 
shape is not seen, but merely imagined within the region indicated by the four 
placeholders. 

The striking finding is that the performance of AI, a subject with aphanta- 
sia, was not significantly different from controls on either of these tasks. 
Controls performed with a 90 percent success rate on the working memory 
task and with an 89 percent success rate on the mental imagery task. AI per- 
formed just around 3 percent worse than the controls, which is a statistically 
insignificant difference. Her confidence ratings were also very similar to those 
of the control subjects (and generally quite high, between 3 and 4 on a 1-4 
scale). The authors’ conclusion is that the subject’s aphantasia did not have a 
statistically significant effect on the performance of either of these tasks. 

Let’s set the working memory task aside. How could we explain the find- 
ing that an aphantasic subject’s performance on the mental imagery task is 
not significantly worse than the controls’ performance? The straightforward 
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explanation is that the subject does use mental imagery and uses it in a very 
similar way to the control subjects when performing the mental imagery 
task. But while the controls use conscious mental imagery, the subject uses 
unconscious mental imagery. 

So far, evidence seems to support the claim that at least some subjects with 
aphantasia have unconscious mental imagery. But another experimental find- 
ing, on the face of it, seems to go against the existence of unconscious mental 
imagery among aphantasics. This experiment (Keogh and Pearson 2018), like 
the experiment I discussed above, also uses the binocular rivalry paradigm. 
Again, we know that conscious mental imagery influences the patterns of bin- 
ocular rivalry. The question is, how does this process unfold among aphantasics? 

Participants were first taught that upon the presentation of the letter G, 
they are supposed to imagine green vertical lines and upon the presentation 
of the letter R, they should imagine red horizontal lines. During the experi- 
ment, they were shown one of these letters, which cued them to imagine 
either red horizontal or green vertical lines for 6 seconds. After this, they 
rated how vivid their imagery of the lines was. Finally, this was followed by 
the binocular rivalry task with red horizontal lines presented to one eye and 
green vertical lines presented to the other (see Figure 4). 

There was no statistically significant effect of the imagining task on the bin- 
ocular rivalry performance of subjects with aphantasia. While imagining red 
lines in the case of control subjects led to the dominance of the red lines in 
the binocular rivalry, in the case of subjects with aphantasia, this effect was 
missing. 

There was another difference between aphantasics and control subjects. In 
control subjects, the priming effect was significantly weakened, when, during 
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Figure 4 From Keogh and Pearson (2018) 
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the imagery phase, a luminous (neither red, nor green, but neutral yellow) 
stimulus was shown. But in aphantasics, the luminous stimulus made no dif- 
ference (it also made no difference to the invariably low ratings of the vivid- 
ness of their mental imagery). 

On the face of it, these findings may seem to show that aphantasics do not 
have any imagery, conscious or unconscious. If they had unconscious mental 
imagery, it would have primed the binocular rivalry performance. But it 
didn’t. One could object that, as aphantasia is not a monolithic phenomenon, 
it is not to be assumed that all aphantasics have unconscious mental imagery, 
so maybe the subjects in this study don’t. However, given that there were fif- 
teen subjects, all of whom showed the same response pattern, this is not a 
very satisfying response. 

At this point, it is helpful, however, to compare these results to the Kwok 
et al. (2019) results I discussed above (which was conducted by the same 
group of researchers). I used the Kwok et al. (2019) study to show that uncon- 
scious mental imagery primes the binocular rivalry performance. So if, as 
I argued above, at least some aphantasics have unconscious mental imagery, 
their unconscious mental imagery should also prime the binocular rivalry 
performance. But, as the Keogh and Pearson (2018) experiment shows, this is 
not the case. 

In response to this objection, a crucial difference between the two experi- 
mental setups needs to be pointed out. The mental imagery that was supposed 
to be triggered in the Keogh and Pearson (2018) experiment is voluntary 
mental imagery. The subjects are asked to visualize a certain stimulus, they 
count to three and they voluntarily try to conjure up the mental imagery. In 
the Kwok et al. (2019) study, in contrast, the unconscious mental imagery is 
involuntarily triggered. In fact, the subjects are trying not to have any 
imagery—they voluntarily suppress any conscious mental imagery. 

So the only conclusion we can draw from the Keogh and Pearson (2018) 
experiment concerning the mental imagery of subjects with aphantasia is that 
their voluntary mental imagery does not prime their binocular rivalry perfor- 
mance. This says nothing about the possibility that involuntary unconscious 
mental imagery (in aphantasics or control subjects) would or could prime 
binocular rivalry performance. And, as the Kwok et al. (2019) study shows, 
involuntary unconscious mental imagery, at least in non-aphantasic subjects, 
does prime binocular rivalry performance—so nothing excludes the possibility 
that it does so also in subjects with aphantasia. 

A lot more experimental studies could and should be done on subjects with 
aphantasia that could convince us conclusively that at least some aphantasics 
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do have unconscious mental imagery. Very few neuroimaging studies have 
been done on aphantasics. And given the non-monolithic nature of aphanta- 
sia, we should expect a variety of different results here. But the hypothesis 
that some subjects with aphantasia have unconscious mental imagery could 
be very easily confirmed. 

The studies I have focused on show that unconscious mental imagery in 
aphantasics and conscious mental imagery in control subjects have the same 
behavioral profile when performing a certain task (see also Pounder et al. 
2022). But do they have the same neural profile? What do the visual cortices 
of the subject AI do in the Kwok et al. (2019) experiment? Given that we can 
decode the contents of visual imagery from the activation of V1 and V2 (see, 
for example, Naselaris et al. 2015), it would be relatively easy to check whether 
Al in fact had a retinotopic representation of a diamond or a triangle in V1 
and V2. If so, then it would be very difficult to argue that she does not have 
unconscious mental imagery. 

I argued in this chapter that mental imagery may be, and often is, uncon- 
scious. Just as mental imagery can be voluntary or involuntary, it can also be 
conscious or unconscious. Unconscious mental imagery will play an impor- 
tant role throughout the book. 


5 
The Unity of Mental Imagery 


The way of thinking about mental imagery that I have outlined in the last few 
chapters lands us with a very wide category that would encompass a lot of 
very different mental phenomena. Some may think it is a bad thing. I believe 
that this is in fact an attractive feature of my way of thinking about mental 
imagery—it gives us a very high-level category of mental imagery, which 
we can then divide up (along the lines of the distinctions I enumerated in 
Chapter 3, among others) into useful subcategories. 

But the broader category of mental imagery is useful because it gives us 
more explanatory unification than would other, more fragmented, concepts 
of mental imagery, as I will argue in the following chapters. Explanatory uni- 
fication is a theoretical virtue of scientific (and also philosophical) theories 
(Kitcher 1981). The more diverse sets of findings a theory can explain the 
more unified it is. My claim is that considering mental imagery to be percep- 
tual processing that is not directly triggered by sensory input gives us a highly 
unified theory of various mental phenomena in this sense. 

In fact, all the following mental phenomena would count as mental imagery 
according to this definition (some not obviously so): 


(a) “Filling in” the blind spot: A part of the retina—the blind spot—cannot 
be stimulated—there are no receptors there. If the light hits this part of 
the retina it gives rise to no perceptual processing. So we receive no 
sensory information from that region of the retina. Nonetheless, our 
perceptual system “fills in” the sensory input of the blind spot on the 
basis of the sensory input of the surrounding parts of the retina. The 
perceptual processing of information at the blind spot region of the 
visual field happens already in early visual cortices (Ramachandran 
1992; Fiorani et al. 1992; Komatsu et al. 2000; Awater et al. 2005; 
Spillman et al. 2006), but it is not directly triggered by sensory input 
because there is no sensory input at the blind spot.’ 


* One may object: hasnt Daniel Dennett’s repeated skepticism about “filling-in” the blind spot (for 
example, Dennett 1991, p. 335ff) demonstrated that this story is incorrect? I don't think so. First, there 
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(b) Peripheral vision: Peripheral regions of the retina are much less sensi- 
tive than focal ones. And this focal preference is even stronger in early 
cortical processing. As a result, the represented properties in the 
peripheral regions of the visual field that our perceptual system pro- 
cesses are much less determinate than the properties of the focal 
regions. This asymmetry is especially striking when it comes to color 
vision as there are very few retinal cells in the periphery that are sensi- 
tive to color information (Hansen et al. 2009). But the same is true of 
all other perceptually processed properties, like size or shape. 
Peripheral vision can also “fill in” some regions of the periphery. 
“Artificial scotoma” is a region of the visual field where different sen- 
sory stimulation is induced from what surrounds it (and this can be 
no sensory stimulation surrounded by a pattern—for example, a small 
patch of white in the middle of random visual noise). If this is pre- 
sented in the periphery, the visual system fills in the scotoma, making 
it blend in. This filling in process starts very early in visual processing 
(Ramachandran and Gregory 1991; De Weerd et al. 1995, 1998, 2006; 
Welchman and Harris 2001; Weil et al. 2007, 2008; Troncoso et al. 
2008). Again, the perceptual processing (of the pattern of random 
visual noise) is not directly triggered by the sensory stimulation at the 
artificial scotoma (because there is no input at all). This is perceptual 
processing not directly triggered by sensory input, hence, an instance 
of mental imagery. 

(c) Amodal completion: Amodal completion is the representation of those 
parts of the perceived object from which we get no sensory stimula- 
tion. In the case of vision, it is the representation of occluded parts of 
objects we see: when we see a cat behind a picket fence, our perceptual 
system represents those parts of the cat that are occluded by the picket 
fence. In tactile perception, it is the completion of those parts of the 
objects we touch that are not in direct contact with our hand, for 


is plenty of empirical evidence that the early cortices do actively “fill-in” the missing part of the visual 
scene (see, for example, Churchland and Ramachandran 1993; Komatsu et al. 2000; see also Akins and 
Winger 1996 for a very good overview of this debate). Second, I’m not even sure that Dennett would 
disagree with anything I say here—his concern in Dennett (1991) was about phenomenology— 
whether there is conscious filling in. And I’m certainly not arguing that there is. My claim is that there 
is cortical filling in. The imagery involved in the filling in of the blind spot is almost always uncon- 
scious imagery. Finally, Dennett’s positive “ignoring” account could also be thought of as a version of 
my own view, according to which the mental imagery used for “filling in” the blind spot attributes very 
determinable properties only—this mental imagery is remarkably unspecific. If we frame the debate 
between the “filling in” account and Dennett’s account, the disagreement may turn out to be about the 
specificity (or determinacy) of the properties attributed to the blind spot. I will say more about the 
determinacy of mental imagery in Chapter 10. 
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Figure 5 Color spreading illusion 
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example. We complete those parts amodally.” Amodal completion is, 
as I will argue in Chapter 8, perceptual processing that is not directly 
triggered by sensory input. It is a form of mental imagery. 

Various optical illusions: Many (but certainly not all) optical illusions 
depend on perceptual processes that are not directly triggered by sen- 
sory input. One example is the color spreading illusion: this is an opti- 
cal illusion where we see a grid against a white background and some 
parts of the grid are colored dark gray, while the rest of the grid is 
lighter gray (see Figure 5). 

When seen from the right distance, those regions of the white 
background that are surrounded by a darker gray grid are perceived as 
(very light) gray. Again, these illusory contours are not directly triggered 
by sensory input. We get monochrome white regions on the retina, but 
there is processing already in the early visual cortices and this leads 
to the experience of gray (Watanabe and Sato 1989)—thus the optical 
illusion. Other optical illusions that depend on perceptual processes 
not triggered directly by sensory input include the McCollough effect 
(where the sensory stimulation is in black and white, but the early 
cortical processing as well as the visual experience is of color), the 
flickering screen illusion (again, sensory stimulation is black and white, 
whereas the early cortical processing as well as the visual experience is 


? Note that the term “amodal” is a bit of a misnomer here: amodal completion in the visual sense 
modality by any account happens visually. 
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Figure 6 Hermann grid illusion 


of various color patterns) and the phantom motion illusion (where the 
sensory stimulation is motionless, but the cortical processing as well 
as the visual experience is of motion) (see, for example, Vul et al. 2008; 
Allefeld et al. 2011; and see Grossberg and Mignolla 1985 for a general 
overview). It is important that this analysis does not apply to all opti- 
cal illusions. Refractionary optical illusions, like seeing the straw in a 
glass of water as broken, have nothing to do with mental imagery— 
whatever is responsible for the illusion happens before the light hits 
the retina. Retinal illusions, like the Hermann grid (Figure 6) don't 
count as mental imagery either—what goes astray here happens during 
retinal processing, and early perceptual processing is directly triggered 
by this (already nonveridical) sensory input. But in the case of some 
illusions, things go astray between the sensory stimulation and early 
perceptual processing—these are cases of mental imagery. 


— 
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Hallucination: Most (albeit not all) instances of hallucinations would 
count as mental imagery according to the definition of mental imagery 
I used (and also according to the definitions of hallucinations used in 
the psychiatry literature; see Allen 2015 and Nanay 2016a for summa- 
ries): it is perceptual processing that is not directly triggered by sen- 
sory input. Here is the official medical definition from the American 
Psychological Associations Dictionary of Psychology: “a false sensory 
perception that has the compelling sense of reality despite the absence 
of an external stimulus” (VandenBos 2007, p. 427). If we think of 
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hallucination in this way, it very clearly falls under the definitions of 
mental imagery in psychology that I considered above (Shepard 1978; 
Kosslyn et al. 1995a; Pearson et al. 2015). Hallucination very much 
qualifies as “seeing’ in the absence of the appropriate immediate sen- 
sory input” (as Kosslyn et al. 1995a would say). And it is also a repre- 
sentation ‘of sensory information without a direct external stimulus’ 
(as Pearson et al. 2015 would say).* And it also fits my own definition: 
it is perceptual processing that is not directly triggered by sensory 
input (see David 2004, p. 108; see also Aleman and de Haan 1998, 
p. 657 for very similar definitions; and the first chapter of Aleman and 
Laroi 2008 for a good overview on defining hallucination). Crucially, 
while hallucinations form a very diverse set of mental phenomena, the 
early sensory cortices are activated in the vast majority of them (see 
Kompus et al. 2011 for a meta-analysis and Allen et al. 2008 for a 
summary; see also Henkin et al. 2000 for some neuroimaging findings 
on hallucination in less-researched sense modalities of olfaction and 
taste).* But it is mental imagery that is conscious, involuntary, local- 
izes egocentrically, and that is accompanied by the feeling of presence. 
(f) Dreaming: Dreaming is widely held to be one particular form of men- 
tal imagery (Hobbes 1655; Walton 1990). And it counts as mental 
imagery according to my definition as well: it is perceptual processing 
that is not directly triggered by sensory input. We have already seen 
that an example of mental imagery that is accompanied by the feeling 
of presence is lucid dreaming. But non-lucid dreaming would also 
count as mental imagery according to my definition—where mental 
imagery may or may not be accompanied by the feeling of presence 
(there seem to be a lot of individual differences in this respect). It is 
also important that dreaming involves early cortical processing, so 
much so that we can reconstruct dreamed objects and scenes just from 
the activation of the cortical regions V1-V4 (Horikawa and Kamitani 


° The minority that considers hallucination to be different from mental imagery (see Ffytche 2008) 
very clearly means something completely different by mental imagery from the psychological consen- 
sus. More precisely, Ffytche (2008) takes mental imagery to be necessarily voluntary and we have seen 
that this is a highly problematic and unmotivated assumption. 

* What may constitute an exception is verbal hallucination in schizophrenia, which seems to be 
brought about by activations of the parts of the brain that are responsible for inner speech (Frith and 
Done 1988). But it is worth noting that these findings are consistent with activity in the primary audi- 
tory cortex (and there is some evidence that there is indeed activity in the primary auditory cortex, 
which would make verbal hallucination in schizophrenia also a form of mental imagery; see Jones and 
Fernyhough 2007 and Kompus et al. 2013). 
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(g) 


(h) 


(i) 


2017). As there is early cortical perceptual processing but no sensory 
stimulation, this is a clear example of mental imagery. 

Episodic memory: Mental imagery is also taken to be a crucial, maybe 
even necessary, feature of episodic memory. One empirical reason for 
this is that the loss of the capacity to form mental imagery results in 
the loss (or loss of scope) of episodic memory (Byrne et al. 2007; see 
also Berryhill et al’s 2007 overview). An even more important set of 
findings is that relevant sensory cortical areas are reactivated when 
we recall an experience (Wheeler et al. 2000; see also Gelbard-Sagiv 
et al. 2008 for context). Again, the subject’s perceptual processing 
(Brewin 2014) is certainly not directly triggered by sensory input (see 
also Laeng et al. 2014 for further empirical evidence on the relation 
between mental imagery and episodic memory and see Chapter 20 
for more discussion of the relation between mental imagery and 
memory). 

Perceptual expectations: It has been shown that prior expectations of a 
specific stimulus evoke a feature-specific pattern of activity in V1 
similar to that evoked by the actual stimulus (Kok et al. 2014, 2017). 
This also counts as mental imagery: perceptual processing that is not 
directly triggered by sensory input. Perceptual expectations will play 
an important role when I discuss temporal mental imagery in 
Chapter 12. 

Attentional templates: In visual search, we use attentional templates. 
When we look for Waldo in the Where’s Waldo book, we have a red 
and white striped attentional template (Stokes and Nanay 2020). When 
we look for the car keys in the living room, we have a key-shaped 
attentional template. We know from studies in the neuroscience of 
attentional templates that these templates that are used in visual search 
are in fact early cortical processing in the visual system that is not trig- 
gered directly by sensory stimulation—in short, they count as mental 
imagery (Keogh and Pearson 2021). 


These cases (a)-(i) are, on the face of it, quite heterogeneous. The common 


denominator between them is that all of them are perceptual processes that 


are not directly triggered by sensory input—either because there is no sen- 
sory input at all (as in (a), (e), (f), and some instances of (b) and (d)) or 
because the sensory input does not trigger our perceptual processing (that is, 


the early cortical processing) directly. 
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But isn’t this way of thinking about mental imagery just too inclusive? 
What will not count as mental imagery according to this definition? Lets con- 
sider a couple of examples. 

Afterimages do not count as mental imagery according to my definition. 
An afterimage is “an image seen immediately after the intense stimulation of 
the eye by light has ceased. For about a second, the afterimage is ‘positive, and 
then it turns to ‘negative’ often with fleeting colours. The positive phase is due 
to after-discharge of the receptors of the eye; the negative phase is caused by 
loss of sensitivity of the receptors as a result of bleaching of the photo- 
pigments by the intense light” (Gregory 1987, p. 13; see Phillips 2013 for a 
philosophical overview; see also Sperandio et al. 2012 for some further wrin- 
kles about how afterimages are subject to size constancy). In the case of after- 
images, there is retinal activation, and the perceptual process it leads to is 
indeed directly triggered by sensory input (that is, by this retinal activation). 
The sensory stimulation is not light hitting the retina, but rather an after- 
effect of the light hitting the retina, but it is an event of retinal activation 
nonetheless—it counts as sensory stimulation. And afterimages are triggered 
directly by this sensory stimulation. 

Further, in the Perky experiment, one of the most famous and earliest 
experiments about mental imagery, subjects are looking at a white wall and 
are asked to visualize objects while keeping their eyes open. Unbeknownst to 
them, barely visible images of the visualized objects are projected on the wall, 
which they take themselves to be visualizing, not perceiving (Perky 1910; 
Segal and Nathan 1964; Segal 1972; see also Dijkstra et al. 2021). I will say 
more about what follows and what does not follow from the Perky experi- 
ments in Chapter 10, but what is important here is that if we accept my defini- 
tion (and the definition used in the psychological literature), the subjects in 
these experiments do perceive (rather than have mental imagery of) the pro- 
jected images: their perceptual processes are directly triggered by sensory 
input. I will come back to two other examples of early cortical processes 
(hyperacuity and constancies) that do not count as mental imagery in 
Chapter 9. 

A sure sign of the explanatory power of a theory is that it can help us to 
keep apart seemingly similar, but in fact very different, mental phenomena. 
And my account of mental imagery can do exactly that. A good example of 
this is phosphenes. 

Retinal cells are normally activated by light. But they can also be activated 
by merely pushing your fingers against your eyeballs. This results in what is 
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called “pressure phosphenes.” Pressure phosphenes are caused by sensory 
stimulation. So they do not count as mental imagery. They amount to percep- 
tual processing that is caused directly by sensory stimulation (of your fingers 
pushing against your eyeballs). But the more general category “phosphene” is 
somewhat unfortunate as it lumps together activation of the retina in the 
absence of light, and the activation of the early visual cortices in the absence 
of light and of retinal stimulation (often called electrically or magnetically 
induced phosphenes). The latter counts as mental imagery, the former doesn't. 

A precondition of any scientific endeavor, but also of any philosophical 
endeavor is to carve the world at its joints: to use concepts that pick out natu- 
ral kinds in the world we are trying to understand. I am somewhat skeptical 
of the explanatory value of the concept of natural kind in general, for various 
reasons (see Nanay 2011c, e, 2013b, 2014a), but for the purposes of this 
discussion, let’s just run with it. What are the natural kinds of our mental life? 

The phosphene example I just gave can be interpreted in the following way: 
If we follow our introspective reports, we lump together pressure phosphenes 
that are brought about because of interference with retinal stimulation and 
magnetically induced phosphenes, where something goes wrong between the 
retina and the visual cortices. But these are radicalaly different mechanisms. 
So this concept of phosphenes does not pick out a natural kind. The concept 
of mental imagery, as I use it, in contrast, does pick out a natural kind. 

More generally, my claim is that the concept of mental imagery, understood 
as perceptual processing that is not directly triggered by sensory input, is very 
high up on the naturalness scale. Perceptual processing is an important natu- 
ral kind. And we cut this natural kind at its joints if we distinguish those per- 
ceptual processes that are directly triggered by sensory input and those that 
are not. Distinctions like voluntary/involuntary or conscious/unconscious 
would be much further down on the naturalness scale. 

This does not mean that we can’t and shouldn't make important distinc- 
tions within the natural kind of mental imagery—we most certainly do so 
when talking about the natural kind of water (if water is indeed a natural 
kind, see Chang 2012): for example, whether it’s solid, liquid, or gas. But the 
distinction between ice and steam will be a theoretically less-important (and 
less-interesting) one than the one between H,O and, say, O.. 

We can, and indeed should, make distinctions between different kinds of 
mental imagery—in Chapters 3 and 4, I attempted to highlight a couple of 
such distinctions (like the one between voluntary and involuntary or between 
conscious and unconscious mental imagery). But these distinctions are more 
similar to the one between ice and steam. 
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And the same goes for the distinction between top-down and laterally 
triggered mental imagery, which is easier to make sense of in the light of the 
examples given in this chapter. Cases (a), (b), and most examples of (d) (that 
is, blind spot, peripheral vision, and some optical illusions) are good exam- 
ples of laterally triggered mental imagery. While there is no sensory input at 
the blind spot or at the periphery of our visual field, the perceptual processing 
of this missing input is triggered laterally. It is the retinal activation around 
the blind spot that drives the perceptual processing of the regions around the 
blind spot, which, in turn triggers the perceptual processing of the missing 
sensory stimulation at the blind spot. And it is the sensory stimulation of 
parts of the visual field that are closer to the fovea that drives the perceptual 
processing of these regions, which, in turn, triggers the filling in of the miss- 
ing sensory stimulation in peripheral vision (it should be noted, though, that 
even in the case of peripheral vision, top-down information may play a role in 
how the missing sensory information gets processed, see, for example, Zhang 
et al. 2009). 

Cases (e), (f), (h), and (i) (that is, hallucination, dreaming, expectations, 
and attentional templates) are examples of fully top-down mental imagery: 
dreams and hallucinations are triggered by some higher-level processes 
regardless of the actual sensory stimulation. (It should be added that the 
actual input may be incorporated into the mental imagery, like the actual 
sound of the alarm clock becoming a sound I seem to hear in my dream or 
the flushing of the toilet that triggers one of the most common hallucinatory 
experiences of voices—I will say more about these hybrid cases in Chapter 9). 

Case (c) (amodal completion), is a mixture between top-down and lateral 
mental imagery. I will say more about how they are mixed in Chapter 9 and 
more about the top-down versus bottom-up versus lateral distinction in 
Chapter 11. 

A final potential worry needs to be addressed. I defined mental imagery as 
perceptual processing not directly triggered by the sensory input. But, given 
recent advances in neuroimaging technology, we can now make more fine- 
grained distinctions when it comes to perceptual, especially early sensory, 
processing. The primary visual cortex is a case in point. Anatomically, the 
primary visual cortex has seven layers. The middle (4th) layer mainly consists 
of mainly bottom-up information from the retina, whereas all the other layers 
consist mainly of top-down information. But there is also a difference between 
superficial (closer to the skull) and deep (further away from the skull) layers of 
the primary visual cortex and some neuroscientists want to reserve the concept 
of mental imagery for processing in the deep layers (see for example, Bergmann 
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et al. 2019). If you close your eyes and visualize an apple, this content can be 
decoded from the deep layers of V1 (but not the superficial ones). And some 
of the perceptual illusions I talked about above (in (d)), like the color spread- 
ing illusion, can be decoded from the superficial layers of V1 (but not the 
deep ones). One might think that this could be a reason to posit a substantial 
distinction between mental imagery on one hand (which amounts to process- 
ing in the deep layers of V1) and other forms of perceptual processing that is 
not directly triggered by sensory input (which amount to processing in the 
superficial layers of V1). 

The problem with this line of reasoning is that the function of the deep and 
superficial layers of V1 are not as different as one might suppose. Both the 
deep and the superficial layers get a lot of top-down input and while it is true 
that the deep layers get more top-down input from further away brain regions 
than the superficial layers, this difference is a matter of degrees: top-down 
signals from far-away brain regions can be detected not only in the deep, but 
also in the superficial layers, and top-down signals from relatively close brain 
regions also reach the deep layers of V1 (Koenig-Robert and Pearson 2021). 
Further, most examples I outlined above (and that we have data about) acti- 
vates both the deep and the superficial layers of V1, for example mental rota- 
tion (lamshchinina et al. 2021) and working memory (Lawrence et al. 2018). 
Perceptual expectations are also decodable from the deep layers (Aitken et al. 
2020). Finally, while there is some evidence that amodal completion activates 
the superficial layers (Muckli et al. 2015), it has been shown that it also acti- 
vates the deep layers (Kok et al. 2016). 

In short, the distinction between processing in the superficial versus deep 
layers is very far away from being an absolute one. Mental imagery encom- 
passes both. 


6 
The Content of Mental Imagery 


Mental imagery is a kind of representation. When I visualize an apple, this is a 
way of representing an apple. And when I amodally complete the back side of 
an apple, this is a way of representing the back side of the apple. This is con- 
sistent with the way psychologists talk about mental imagery. Recall the defi- 
nition of mental imagery we encountered in Chapter 1 from a review article 
on the psychology of mental imagery: “We use the term ‘mental imagery’ to 
refer to representations [...] of sensory information without a direct external 
stimulus” (Pearson et al. 2015). The question is: what kind of representation is 
involved in mental imagery and how should we think about its content? 

I need to say more about how representations show up in early perceptual 
processing and what kinds of representations do so. My claim is that early 
cortical perceptual processing is representational. This is not just a claim 
about mental imagery. Early cortical perceptual processing is representa- 
tional, regardless of whether or not it is directly triggered by sensory stimula- 
tion. So it is representational both when we perceive and when we have 
mental imagery. In both cases, the early cortical processing of a triangle rep- 
resents a triangle. 

The representations in early cortical processing are not typically conscious 
and they are clearly not syntactically structured (that is, not structured the 
way sentences are). So they are very different from, say, conscious beliefs—the 
paradigm of mental representations for some philosophers. But they are 
nonetheless representations in any meaningful, even remotely naturalistic, 
senses of the term (Shea 2018). I will use two of the most influential accounts 
of what perceptual representations are, to show that the representations in 
early cortical perceptual processing do count as bona fide representations. 

The first account comes from Tyler Burge who takes perceptual constancies 
to be the mark of perceptual representation (Burge 2010). Perceptual repre- 
sentations represent distal features of the environment in spite of variations in 
the proximal input. When a car is driving towards you, the outline shape of 
the car takes up a larger and larger part of your retina, but you still perceptu- 
ally represent the car as having the same (distal) size. Crucially, this is already 
true of perceptual processing in the primary visual cortex: there are 
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demonstrated lightness and size constancies in the primary visual cortex 
(MacEvoy and Paradiso 2001; Murray et al. 2006). In short, if constancies are 
the mark of perceptual representations, then the primary visual cortex already 
represents perceptually. 

Another influential take on representations is that a state can only be a rep- 
resentation if it can misrepresent. One way of spelling out this approach is to 
say that perceptual representations have the function to carry information 
about some external state (Dretske 1988; Millikan 1995). Even on those occa- 
sions when the perceptual representation fails to carry information about an 
external state of affairs, it still has the function to do so. And that is what 
happens when a perceptual representation misrepresents. The retina, to sim- 
plify a bit, just slavishly registers whatever is in front of it. It is not capable of 
misrepresentation—therefore it does not represent. The primary visual cor- 
tex, in contrast, is not merely registering the input. It can and does misrepre- 
sent. When we look at an illusory contour, the primary visual cortex 
represents an edge, but the edge is not there (Kok et al. 2016; see Chapter 8 
below). It misrepresents. And the same goes for the representation in V4, MT, 
and so on. Early cortical representations are bona fide representations. 

So we have a number of different early cortical perceptual representations 
(both in the case of perception and in the case of mental imagery). The repre- 
sentation in the primary visual cortex represents contours (among other fea- 
tures). The representation in V4 represents colors (among other things). And 
so on. When we talk about the content of perceptual states (of, say, seeing an 
apple on the table), the content of this overall perceptual state depends on 
(or maybe it is even determined by) the content of all these subpersonal rep- 
resentations. And when we visualize an apple, the content of our overall mental 
imagery also depends on the content of all these subpersonal representations. 

I should acknowledge that not everyone is comfortable talking about the 
content of perceptual experiences. Some philosophers argue that perceptual 
experiences do not represent anything: they are not representations of objects 
but relations to the perceived objects (Campbell 2002; Martin 2004, 2006; 
Brewer 2011; Logue 2012; French 2018). This is not the place to argue against 
such views (but see Nanay 2014d, 2015c, 2016e, 2022b; Berger and Nanay 
2016). However, I dort see why someone who thinks of perceptual experi- 
ences in this way could not go along with everything I have said so far, with 
one tiny modification. 

These “relationalists” about perception are interested in conscious percep- 
tual experiences and their claim is that these conscious perceptual experi- 
ences are not representations (or else, even if they are, their content does not 
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explain the phenomenal character of our perceptual experience). It would be 
consistent with relationalism to claim that subpersonal and unconscious rep- 
resentations are involved in early cortical perceptual processing. What rela- 
tionalists would deny is that the overall perceptual experience that these 
subpersonal and unconscious representations involved in early cortical per- 
ceptual processing give rise to would be a perceptual representation itself 
(and one whose content depends on the content of the subpersonal represen- 
tations in early cortical perceptual processing). Instead, they would have a 
different, but structurally similar, claim about how these early cortical percep- 
tual representations make perceptual experience possible. But relationalists 
needn't disagree with the existence and importance of early cortical percep- 
tual representations. 

Mental representations attribute properties to entities. There are important 
debates in the philosophy of perception about just what these properties 
could be and also about what kind of entities these properties are attributed 
to. But these debates tend to be about conscious perceptual experiences. The 
debate about the range of properties represented in perception is most often a 
debate about what properties we perceptually experience (Siegel 2006; Bayne 
2009; Nanay 2011a, 2011d, 2012d). So it’s about conscious perception (but see 
Nanay 2012c for an attempt to tease apart the question about what properties 
perceptual states represent and what properties we consciously experience in 
perception). 

But the same question can be raised about the perceptual representations 
involved in early cortical perceptual processing. We know that, to simplify a 
bit, the primary visual cortex represents contours, V4/V8 colors, and MT 
motions (in the sense of representation specified above). 

But what are these properties attributed to? I need to introduce a bit of ter- 
minology here. Sensory individuals are the individuals (objects or events) we 
perceptually represent as having properties. So when I see an apple, I percep- 
tually attribute some properties (say, roundness, redness) to a sensory indi- 
vidual (see Cohen 2004; Nanay 2013a on the concept of sensory individual). 

The standard story about visual perception is that a range of properties 
(definitely shape, color, and spatial location, but possibly also dispositional 
properties or natural kind properties or action properties) is attributed to 
ordinary objects (or events). So when I am looking at an apple, my perceptual 
representation attributes properties to an object: the apple. 

According to this view, the sensory individuals of vision are ordinary 
objects like an apple or a cedar tree. As David Armstrong says, “In perception, 
properties and relations are attributed to objects” (Armstrong 2004, p. 20; see 
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also Shoemaker 1990, p. 97; Brewer 2007, p. 88, to mention just a few exam- 
ples). The concept of ordinary object is not as straightforward as it might seem, 
as it should not rule out shadows and rainbows, which are not physical objects. 
Here is Mohan Matthen’s definition of what would count as an ordinary object: 
a “spatio-temporally confined and continuous entity that can move and take its 
features with it” (Matthen 2005, p. 281; see also Cohen 2004; Matthen 2004, 
2010; for similar views; but see also Clark 2000, 2004; Nanay 2013a). 

A minority position in this classic philosophy of perception debate con- 
cerning vision is that these properties are attributed to a spatiotemporal loca- 
tion and not the ordinary object that occupies this spatiotemporal location 
(Clark 2000, 2004, 2011). And while this view is often quickly dismissed in 
the case of conscious perceptual representations, the arguments against it 
tend to be introspection-based arguments (see, for example, Cohen 2004; 
Matthen 2005). They may or may not work when it comes to conscious per- 
ceptual representations, but they definitely won't work in the case of the rep- 
resentations of the early cortical perceptual processing. 

In fact, when talking about representations involved in perceptual processing 
in the primary visual cortex, it would be problematic to talk about properties 
attributed to objects, as perceptual objects only show up much later in the per- 
ceptual processing. The primary visual cortex processes contours, but these 
contours are not contours of objects—they are not bound to ordinary objects 
like apples. Same for V4: it processes colors, but not the color of objects. 

So, in the case of early cortical perceptual representations, the long dis- 
missed view, that the sensory individuals that perceptual representations 
attribute properties to are spatiotemporal regions, seems to be a much better 
candidate (Nanay 2022c). Things get messier when we turn to non-visual 
sense modalities, like audition or olfaction, partly because the philosophical 
debate about their sensory individuals is somewhat more complicated. 

The debate about what audition attributes properties to is not about ordi- 
nary objects (or events) versus spatiotemporal regions. It is about ordinary 
objects (or events) versus sounds. And in the case of olfaction, it is about 
ordinary objects versus odors. And it is not at all clear what kind of entities 
sounds and odors are. Again, these debates tend to be about conscious audi- 
tory and olfactory experiences: about what we hear and what we smell. 

But if we ask instead what sensory individuals the representations involved in 
early auditory cortical processing attribute properties to (I will leave olfaction 
aside for now because there are some further wrinkles there; see Chapter 14), 
we get a very different range of options. And, as in the case of vision, the most 
plausible candidate—one that is not even on the radar when it comes to the 
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debate about auditory and olfactory sensory individuals (although there are 
some exceptions: Nanay 2013a; chapter 4 and Cohen 2010, for example)— 
seems to be spatiotemporal regions, as the main function of the primary audi- 
tory cortex is the spatial and temporal segregation of the auditory field into 
auditory units based on the input of frequencies (Bregman 1990). And if 
properties are attributed to spatiotemporal regions in perception, we have 
strong reasons to suppose that they are attributed to spatiotemporal regions 
in mental imagery as well. 

We have seen that the content of overall perceptual states depends on the 
content of these subpersonal early cortical perceptual representations. 
Similarly, the content of mental imagery also depends on the content of these 
early cortical perceptual representations. Just how the overall mental imagery 
representation is put together from the subpersonal representations of early 
cortical perceptual processing is a complicated question. 

So far, this chapter has been about representational content. But questions 
about the content of representations are difficult to separate from questions 
about the format of representations. The usual starting point of talking about 
representational format is the difference between the way pictures and sen- 
tences represent. Pictures represent imagistically or iconically and sentences 
represent non-imagistically or propositionally. They may represent the same 
thing: say, a red apple on a green table. But they represent this red apple on a 
green table differently—the format of the representation is different. 

So the question is: does mental imagery represent the way pictures do or 
the way sentences do? This was the central question of the so-called “Imagery 
Debate” of the 1980s (see Tye 1991 and Cohen 1996 for summaries). It was 
this debate that made philosophers take the concept of mental imagery seri- 
ously again, after a long period of behaviorist-inspired skepticism about any- 
thing imagery-related. 

The Imagery Debate is historically significant for yet another reason: it 
helped us to appreciate how interpersonal variations in mental imagery can 
have a major impact on one’ philosophical/theoretical positions. As we saw 
in Chapter 3, an important and fairly large study conducted at a time when 
the Imagery Debate was on its way out showed that the vividness of imagery 
has significant impact on theoretical commitments in this debate (Reisberg 
et al. 2003), inasmuch as researchers with less-vivid mental imagery tended to 
opt for the symbolic/propositional side and those with more vivid mental 
imagery were more likely to take the iconic/imagistic side. Given the depend- 
ence on the vividness of one’s mental imagery, one might wonder just how 
substantive the Imagery Debate really was. 
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There are many ways of characterizing the distinction between imagistic 
and propositional formats, some more controversial than others.’ I take the 
least controversial way of characterizing imagistic content to be Christopher 
Peacocke’s: “representation of magnitudes, by magnitudes” (Peacocke 2019, 
p. 52—magnitudes are properties that come on a scale; see also Maley 2011; 
Beck 2014; Lee et al. 2022).” And at least according to this criterion it seems 
crystal-clear that mental imagery has imagistic format. 

Early cortical representations represent magnitudes by means of magni- 
tudes. They represent magnitudes like illumination, contours, color, and they 
do so by means of magnitudes in the early sensory cortices. We have seen in 
Chapter 1 that the early cortices are retinotopic. If you are looking at a trian- 
gle, there is a (somewhat distorted) triangle-pattern of direction-sensitive 
neurons in your primary visual cortex (see Grill-Spector and Malach 2004 for 
a summary). This is imagistic representation par excellence. And if you visu- 
alize a triangle (or if you amodally complete one) there is also a triangle- 
pattern of direction-sensitive neurons in your primary visual cortex (Kosslyn 
et al. 2006). Again, imagistic representation par excellence, at least according 
to the “representation of magnitudes by magnitudes” criterion. In this sense, 
we can agree with the recent consensus among psychologists and neuroscien- 
tists (including some of the original participants of this debate) who explicitly 
declared this debate dead (see esp. Pearson and Kosslyn 2015). 

To sum up, mental imagery represents the way perception does and under- 
standing the relation between the content of perception and the content of 
mental imagery is crucial for understanding both mental phenomena. I will 
say (much) more about the relation between perception and mental imagery 
in Chapter 7. 

The aim of Part I of the book was to make clear what mental imagery, that 
is, perceptual processing not directly triggered by sensory input, amounts to. 
The aim of Part II is to argue that mental imagery plays a very important role 
in everyday perception. 


1 To mention just one often-emphasized difference, very few parts of the sentence “there is a red 
apple on a green table” represent part of what the sentence itself represents, whereas many parts of the 
picture of the red apple on a green table represent part of what the whole picture represents (see 
Kulvicki 2014). 

? Small terminological point: this is Peacocke’s characterization of what he calls “analogue repre- 
sentation” (which he takes to be different from iconic representations). I will use the term “imagistic” 
to refer to Peacocke’s “representation of magnitudes by magnitudes.” Less-small terminological point: 
we can bracket various conceptual issues about the logical relation between analogue and iconic/ 
imagistic representations as all parties in this debate assume that propositional representations do not 
have analog content in this sense: propositional representations do not represent magnitudes by 
magnitudes. 


PARTII 
PERCEPTION 


7 


Mental Imagery in Perception 


Some perceptual processing starts with sensory stimulation. The light hits our 
retina and vision is the complex visual processing of this sensory stimulation. 
This perceptual processing may include, depending on whom you ask, the 
interpretation or the elaboration or the embellishment of the sensory stimula- 
tion, but it is the sensory stimulation that is processed/interpreted/elabo- 
rated on. 

But some other cases of perceptual processing are not the processing of 
sensory stimulation because there is no sensory stimulation to be processed. 
These perceptual processes would count as mental imagery according to my 
definition: they are perceptual processes that are not directly triggered by sen- 
sory input. 

Observant readers could spot the potential for major terminological confu- 
sion here. I call one kind of perceptual processing (that is triggered directly by 
sensory input) perception proper. And I call another kind of perceptual pro- 
cessing (that is not directly triggered by sensory input) mental imagery. So 
some kind of perceptual processing will count as something other than per- 
ception: perceptual processing that is not directly triggered by sensory input 
counts as mental imagery, not perception. In order to mitigate this potential 
confusion, I will use the term “sensory stimulation-driven perception” to 
refer to perceptual processing that is triggered directly by sensory stimula- 
tion. So not all perceptual processing is sensory stimulation-driven percep- 
tion. Mental imagery is not. 

My main claim in the next couple of chapters is that what we pre- 
theoretically take to be perception is in fact a hybrid of sensory stimulation- 
driven perception and mental imagery. But if this is true, then we should 
reevaluate many generally held assumptions about perception. 

An old and influential (Kantian) idea about mental imagery (or imagina- 
tion) is that it is “a necessary ingredient of perception itself” (Strawson 1974, 
p. 54). The metaphor and the quote are originally from Kant (Critique of Pure 
Reason, A120, fn. a; see also Sellars 1978; Thomas 2009; Gregory 2018), but it 
had become a widespread slogan by the nineteenth century. Eugéne Delacroix, 
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for example, wrote: “Even when we look at nature, our imagination constructs 
the picture.” 

There are many ways of substantiating this claim, some more plausible than 
others. The original Kantian idea amounts to a constitutive claim, according 
to which perception depends constitutively on imagination (or, more plausi- 
bly, on mental imagery). Constitutive dependence claims are notoriously dif- 
ficult to prove, so I will stay away from them for the purposes of this book 
(although I toyed with them in Nanay 2017b). Instead, I will defend a 
relatively modest version of this claim, according to which what we pre- 
theoretically take to be perception is in fact a hybrid of sensory stimulation- 
driven perception and mental imagery. 

A lot has been said in philosophy and psychology about the relation 
between mental imagery and perception. Most of this research has focused on 
the similarities and differences between mental imagery and perception. We 
will encounter one aspect of this question in Chapter 10 below: the way in 
which conscious mental imagery appears to be similar to conscious percep- 
tion. Visualizing an apple feels similar to seeing an apple—how can we 
explain that? 

A much more important set of findings is about the similarity of processing 
in the case of mental imagery and perception. As we have seen, there is an 
almost complete overlap between the parts of the perceptual system involved 
in mental imagery and the parts of the perceptual system involved in percep- 
tion (see, for example, Bartolomeo 2002; Kosslyn et al. 2006; Boccia et al. 
2017; but see also Lee et al. 2012 for some wrinkles, and Dijkstra et al. 2019 
and Pearson 2019 for a summary). Further, the capacity limitations (Keogh 
and Pearson 2017) as well as the patterns of cortical activation are also similar 
in perception and mental imagery (Page et al. 2011; Clichy et al. 2012; but see 
also the discussion of the differences in terms of the layers of the visual corti- 
ces involves in Chapter 5).” Finally, the similarities between the perceptual 
processes of perception and imagery are also revealed by how mental imagery 
can lead to low-level perceptual learning (Tartaglia et al. 2009). 

Another important set of experimental findings in this context is about our 
eye movements during visual imagery and visual perception (I will focus on 
the visual sense modality for ease of exposition, but we have very similar phe- 
nomena in the olfactory sense modality; see Bensafi et al. 2003): our eye 


1 Delacroix: Journal, 1859, September 1. 
* See also Cavedon-Taylor (2021a, 2021b) for a discussion of the relation between perception and 
mental imagery, which is very different from mine. 
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movement during visual imagery re-enacts that of the perception of the same 
visual scene. When we visualize a scene, our spontaneous eye movements 
reflect the content of the visual scene (Brandt and Stark 1997; Spivey and 
Geng 2001; Laeng and Teodorescu 2002; Mast and Kosslyn 2002; Altmann 
2004; Johansson et al. 2006; see Laeng et al. 2014 for a good summary; see 
also Sartre 1940/1948 for some surprisingly similar claims). For example, 
when we perceive a pattern in a grid, our eye movements are isomorphic to 
our eye movements when we visualize the same pattern.” 

The relation between eye movements in perception and mental imagery are 
especially intriguing. When we look at an object that moves slowly, our eyes 
track this movement with small and smooth micro-saccades. When we visu- 
alize an object moving, the eye movement, while it follows the same general 
spatial pattern, is somewhat different. There are no smooth small micro- 
saccades, but instead larger, often voluntarily triggered eye movements. 
Interestingly, our eye movements in other forms of mental imagery are more 
similar to perception. In dreaming, for example, our eye movements are very 
much like the smooth eye movements with small micro-saccades in percep- 
tion and very different from the larger eye movements of visualizing (LaBerge 
et al. 2018). 

One may wonder whether these findings point to some kind of relation 
between the nature of eye movements and the feeling of presence. The two 
surely co-vary with each other. Perception: we get smooth micro-saccades 
and we get a feeling of presence. Dreaming: we also get both. Visualizing: we 
get neither. More research would need to be done to determine whether the 
eye movements explain the feeling of presence, or maybe the other way round 
(or neither, and we have a mere covariation between the two). 

The findings about the similarities and differences between perception and 
mental imagery will play an important role in Chapter 10. But an even more 
important (and surprising) aspect of the relation between perception and 
mental imagery, as I will argue, is that what we naively take to be perception is 
in fact a mixture of sensory stimulation-driven perception and mental 
imagery (see Chapter 9). 

Let’s go back to the Kantian constitutive dependence claim (which I don’t 
endorse): that perception depends constitutively on mental imagery. Claims 
about constitutive dependence are routinely contrasted with claims about 


° These findings are not limited to the similarities of eye movements when it comes to perceived 
and visualized shape properties. The dilation of the pupil also reflects the brightness or darkness of the 
imagined scene (Laeng and Sulutvedt 2014). 
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causal dependence. This contrasting claim would be that perception depends 
causally on mental imagery. As my claim about the perception/mental 
imagery hybrid is weaker than the constitutive dependence claim, but 
stronger than the causal dependence claim, I want to outline briefly how it 
differs from both. 

Few would deny that perception depends causally on mental imagery—we 
have plenty of evidence that mental imagery can change our perceptual state. 
But lots of things have a causal influence on our perceptual states. For exam- 
ple, LSD in the bloodstream. That does not mean that perception depends 
constitutively on LSD in the bloodstream. Our perception can depend caus- 
ally on LSD, but not constitutively. The constitutive claim would be that men- 
tal imagery is not like LSD in this respect. 

Consider the first two perceptual processes I listed in Chapter 5 that would 
count as mental imagery under my definition: perceptual processing of the 
missing information in the blind spot and peripheral vision. Every time we 
see anything, our perceptual system processes shape and color information 
corresponding to the space of the blind spot in the visual field, but this pro- 
cessing is not directly triggered by the sensory input from the blind spot 
because there is no sensory input from the blind spot. So we use mental 
imagery each time we visually perceive anything. And the same goes for 
peripheral vision. 

In this sense, all instances of everyday perception depend on mental 
imagery. Do the examples of the blind spot and peripheral vision show that all 
instances of everyday perception depend on mental imagery constitutively? I 
don’t think so. The filling in of the blind spot or of the indeterminacies of the 
periphery does have an impact on the content and phenomenology of our 
perceptual states, but it does not make the perceptual state what it is. 

In the literature in metaphysics about the difference between causal and 
constitutive dependence (see Ylikoski 2013 for a summary), the basic assump- 
tion is that X depends on Y constitutively and not merely causally if Y is part 
of what makes X what it is. You take away Y and X is no longer X. Free elec- 
tions are constitutive of democracy. If you don't have free elections, you no 
longer have democracy. Or, to use my favorite quote on constitutive depend- 
ence from the film Caddyshack (1980), uttered by Ty Webb (Chevy Chase): 
“A flute without holes, is not a flute. A donut without a hole, is a Danish.” 

No hole, no donut. Similarly, the Kantian constitutive dependence claim 
holds that without mental imagery, perception would not be perception. I 
think that this is almost true, but not quite. More specifically, the third exam- 
ple of mental imagery I gave in Chapter 5, amodal completion, could be 
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thought to provide good reasons for a strict constitutive dependence claim, 
but I don’t think it will get all the way there. 

That is why I resist the constitutive dependence talk and will argue in 
Chapter 8 (and later in Chapter 13), instead, that what we naively take to be 
perception is a mixture of sensory stimulation-driven perception and mental 
imagery. The proportions of these two vary case by case. And there are some 
(rare) cases where the mental imagery component is completely missing, 
which is why the constitutive claim is too strong. Kant said that imagination 
(most plausibly understood as mental imagery) “is a necessary ingredient of 
perception itself” I want to tone this down a bit and I will argue that mental 
imagery is a crucial, albeit not necessary, ingredient of perception itself. But 
even this claim, even taking ordinary perception to be a mixture of sensory 
stimulation-driven perception and mental imagery, has radical consequences 
for a number of problems and debates concerning perception. 


8 


Amodal Completion 


Amodal completion is the representation of those parts of a perceived object 
that we get no sensory stimulation from. I will argue that one of the strongest 
cases for the importance of mental imagery in perception comes from amodal 
completion (as we shall see in Chapter 13, there might be an even stronger 
case from the multimodality of perception). 

In the case of vision, amodal completion is the representation of occluded 
parts of objects we see: when we see a cat behind a picket fence, our percep- 
tual system represents those parts of the cat that are occluded by the picket 
fence.’ We also get amodal completion in non-visual sense modalities. In tac- 
tile perception, it is the completion of those parts of the objects we touch that 
are not in direct contact with our hand, for example. We complete those parts 
amodally. 

In the case of audition, when we hear a loud bang while listening to a tune, 
the auditory system continues to represent the tune even in that brief moment 
when the bang is the only auditory stimulation. The loud bang blocks (we 
could say, it occludes) part of the tune. A popular demonstration of auditory 
amodal completion is the American late night show host Jimmy Kimmel’s 
segment “A week in unnecessary censorship,’ where he beeps out completely 
harmless words from famous politicians, making them sound like expletives 
(see also Young and Nanay 2022 on olfactory amodal completion). 

Amodal completion is not a perceptual curiosity: it is part of our ordinary 
perception. It happens very rarely in real-life situations that we can perceive 
an object without exercising amodal completion: in natural scenes we always 
get occlusion because objects tend not to be fully transparent. Stop reading 
this for a moment and look around the room. Probably very few of the objects 
in your visual field are fully in view: they tend to be occluded by your desk, 
your computer, your hand, and so on. More generally, every time we see an 


1 The term “amodal” may come across as somewhat confusing inasmuch as it may suggest some 
kind of representation that is not connected to any of the sense modalities (maybe some kind of non- 
perceptual representation). But when it was introduced by Henri Michotte in the 1950s, it merely 
indicated the perceptual representation of a feature that is not accompanied by the usual visual phe- 
nomenology (see Michotte and Burke 1951; Michotte et al. 1964). 
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object occluded by another object (which means in every real-life perceptual 
scenario, barring odd cases of fully transparent visual scenes or very simple 
visual displays), we use amodal completion of the occluded parts of perceived 
objects (Bakin et al. 2000). And the same goes for the backside of any solid 
object—sometimes referred to as self-occlusion. Again, we do not receive any 
sensory input from the backside of solid three-dimensional objects, but there 
is nonetheless perceptual processing of this missing information—in a way 
reminiscent of more familiar cases of amodal completion (Nanay 2010a; 
Ekroll et al. 2016). 

When we see a cat behind the picket fence, we represent those parts of the 
cat that are occluded by the picket fence. The question is: how do we do so? 
What kind of representations are the ones that we use in amodal completion? 

Amodal completion is early perceptual processing of a contour that is not 
directly triggered by sensory input. It is well-documented that the amodally 
completed contour shows up already in the primary visual cortex (see Kovacs 
et al. 1995; Sugita 1999; Bakin et al. 2000; Lee and Nguyen 2001; Komatsu 
2006; Hegde et al. 2008; Lommertzen et al. 2009; Vrins et al. 2009; Smith and 
Muckli 2010; Bushnell et al. 2011; Shibata et al. 2011; Lee et al. 2012; Pan et al. 
2012; Ban et al. 2013; Emmanouil and Ro 2014; Hazenberg et al. 2014; 
Scherzer and Ekroll 2015; Thielen et al. 2019; Gerbino 2020). For example, if 
the Kanizsa triangle (Figure 1, on p. 16) is projected onto the retina, the 
direction-sensitive neurons in the primary visual cortex along the illusory 
contours of the invisible sides of the triangle are activated (Maertens et al. 
2008; Kok et al. 2016; De Haas and Schwarzkopf 2018). To put it very simply, 
on the retina, we have the Kanizsa triangle, but the V1 already represents the 
missing sides of the triangle. 

In other words, when you amodally complete the cat behind the picket 
fence, in V1 there is activation of direction-sensitive neurons that is not 
directly triggered by the retinal input of where the cat’s outlines would be, 
because there is nothing on the retina that would correspond to these con- 
tours. So amodal completion is perceptual processing that is not directly trig- 
gered by sensory input. It is mental imagery. 

Amodal completion is, of course, in some sense, driven by the retinal 
image. What determines how the occluded parts of the cat are represented is 
the retinal input from the non-occluded parts. But the representation of the 
amodally completed features is not directly triggered by the sensory stimula- 
tion. The amodally completed features would be directly triggered by retinal 
input that is homomorphic with the completed features. The perceptual pro- 
cessing of the contours of the picket fence are directly triggered by the retinal 
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input of the contours of the picket fence. But the perceptual processing of the 
contours of the cat’s occluded tail are not directly triggered by the retinal 
input of the contours of the cat’s occluded tail. The perceptual processing of 
the contours of the cat’s occluded tail is mediated by the perceptual represen- 
tation of the contours of the picket fence and the cat’s other (unoccluded) 
body parts, which, in turn, are triggered by the retinal input of the contours of 
the picket fence and the cat’s other (unoccluded) body parts. This is very 
much indirect triggering (where the perceptual processing of the contours of 
the picket fence and the cat’s other (unoccluded) body parts mediate between 
the sensory input and the perceptual processing of the occluded cat parts). In 
short, it is mental imagery. 

Not everyone uses the concept of mental imagery the way I outlined (and 
what I take to be the consensus view in psychology and neuroscience). Here is 
a puzzling remark in a paper by Vebjorn Ekroll and colleagues, which may 
seem to suggest that not everyone is on board with the idea that amodal com- 
pletion is a form of mental imagery: 


our experience of the hidden backsides of objects is sometimes based on 
genuine perceptual representations rather than mere cognitive guesswork or 
imagery, despite the lack of any direct sensory stimulation reaching the eye 
from the hidden backsides themselves. (Ekroll et al. 2016, p. 3) 


This quote seems to present a choice between perceptual representation on 
the one hand and cognitive guesswork and imagery on the other. In my 
account, imagery is a form of perceptual representation, so it is definitely on 
the perceptual side of this divide and has very little to do with cognitive 
guesswork. Ekroll seems to rely on a very unusual way of understanding men- 
tal imagery, probably as active imagination (Ekroll confirmed this in a per- 
sonal communication in August 2019). If we understood mental imagery this 
way, then amodal completion is clearly not imagery. If we understood mental 
imagery as perceptual representation not directly triggered by sensory input, 
then amodal completion is clearly mental imagery.” 


> One may worry about what happens if I visualize a cat behind a picket fence. Would this, accord- 
ing to my account, amount to the mental imagery of mental imagery? This question could only be 
addressed by consulting what happens in the visual cortices when visualizing a cat behind a picket 
fence. If there is early cortical activation that would correspond to the occluded parts of the visualized 
cat, then it is mental imagery. So the mental imagery of mental imagery would be mental imagery (see 
Lewis 1983; Lamarque 1987; Currie 1995c; and especially Nichols 2003 on the structurally parallel 
question about whether and in what sense imagination—not imagery—can be iterated). Thanks to 
Anders Nes for raising this objection. 
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Amodal completion is not sensory stimulation-driven perception because 
the early cortical representation of the amodally completed contours is not 
directly triggered by the sensory input. Where the direct trigger would be— 
that is, a corresponding contour on the retina—is blatantly missing. And 
amodal completion is not a post-perceptual process either, as we have plenty 
of evidence that amodal completion happens very early in perceptual pro- 
cessing. If amodal completion happens in the primary visual cortex, it is not 
happening on the level of beliefs/non-perceptual representations—it happens 
much earlier. 

But the proponents of a post-perceptual account of amodal completion 
could argue, as a last resort, that amodally completed properties are repre- 
sented by beliefs and this, in turn, activates the primary visual cortex by 
means of some kind of top-down influence. They could maintain that the 
early cortical activation is not itself amodal completion—it is a consequence 
of the amodal completion and amodal completion itself is a post-perceptual 
phenomenon. 

There are two problems with this response, one theoretical, one empirical. 
First, the theoretical. This response would amount to saying that the retino- 
topic perception of the visible parts of the object gives rise to a non-retinotopic 
belief (the actual amodal completion), which then triggers, in a top-down 
manner, the retinotopic representation of the occluded parts of the object. So, 
by representing the occluded part by means of a belief, we lose retinotopy, 
which then somehow gets put back in for the well-demonstrated retinotopic 
activation of the primary visual cortex. Not a very plausible picture. 

Second, and more decisively, there is plenty of empirical evidence that this 
picture cannot be correct, given what we know about the timing of amodal 
completion. Amodal completion in the early cortices happens within 100-200 
milliseconds of retinal stimulation (Sekuler and Palmer 1992; Rauschenberger 
and Yantis 2001—this is true even of complex visual stimuli, like faces; see 
Chen et al. 2009; see also Lerner et al. 2004; Rauschenberger et al. 2006; and 
Yun et al. 2018 for detailed studies that track the (very quick) temporal 
unfolding of amodal completion in different parts of the visual cortex). And 
this is much much shorter than the time that would be needed for perceptual 
processing to reach all the way up to beliefs or non-perceptual representa- 
tions and then trickle all the way down again to the primary visual cortex (see 
Thorpe et al. 1996 and Lamme and Roelfsema 2000 for the temporal unfold- 
ing of visual processing in non-amodal cases). To sum up, taking amodal 
completion to be post-perceptual is not consistent with neuroimaging data 
about early cortical processing in amodal completion. 
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It is important to note that the timing data on how amodal completion 
works does not rule out that amodal completion can be, and often is, influ- 
enced in a top-down manner (as top-down modulation does not require the 
activation going all the way up to non-retinotopic representations and then 
trickling all the way down). In fact, amodal completion is often, but not 
always, top-down influenced. 

Sometimes the way our perceptual system completes figures amodally is 
insensitive to our beliefs and expectations. It may even go straight against our 
beliefs and expectations, as in the case of the horse illusion (see Kanizsa 1972; 
see Figure 7): our visual field is filled with identical horse shapes, nonetheless, 
we can't help but complete the occluded shape as one very long horse that is 
very different from all the other shapes—and not as two normal horses that 
would be very similar to the ones in our visual field. 

But some other instances of amodal completion are very much sensitive to 
top-down influences (Hazenberg et al. 2014; Hazenberg and Van Lier 2016). 
When I see you with your hands in your pocket, I amodally complete the 
occluded hands in a way that is only possible for me because I have seen your 
hands (or the hands of other humans) unoccluded. Similarly, we are very 
good at amodally completing letters, numerals, words, and sentences. We are, 
predictably, much better at amodally completing words in languages we speak 
than in languages we do not speak. This, again, suggests that the mental 
imagery that is involved in amodal completion can be subject to top-down 
influences. 


Figure 7 Horse illusion 
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Figure 8 Bimodal picture 


Here is another example. Look at Figure 8. See anything in the picture? 
Probably not. Now look at Figure 9 and go back to Figure 8 again. 

There is a huge difference in terms of your experience, and this is one of the 
most elegant demonstrations of how there can be top-down influences on our 
perceptual phenomenology. But the crucial finding from my point of view is 
not about phenomenology, but about early cortical processing: the direction- 
sensitive neurons in the primary visual cortex that are located where the com- 
pleted illusory contour is, behave very differently before and after you looked 
at Figure 9. Before looking at Figure 9, they did not fire, but afterwards, the 
neurons that are sensitive to the direction of the completed contour did fire 
(Teufel et al. 2018, see also Teufel et al. 2015). So perceptual processing in V1 
is clearly sensitive to some higher-up level of perceptual processing. Pm not 
saying that this higher-up level is very high up (so that it would involve con- 
cepts), but it is definitely further up than V1. So we have a nice illustration of 
top-down-influenced amodal completion. 

Here is another example, taken from the 1980s classic comedy Top Secret. 
One of the many visual jokes of the film has the main character crawl in the 
mud, shown in close up and suddenly he faces two East German military boots, 
framed in a way that we can only see the boots. He looks scared and the camera 
zooms out, revealing that it is only two boots standing in the mud, there is no 
soldier in them. Again, we use amodal completion to represent what is outside 
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Figure 9 Original picture 


the frame (a theme I will say a lot more about in Chapter 31), and we use a lot 
of high-level information to complete what is outside the frame, for example, 
knowledge that military boots usually continue upwards in soldiers. 

In short, amodal completion (just as mental imagery in general) can be, but 
does not have to be, top-down driven. If one’s concept of mental imagery pre- 
supposes that mental imagery is necessarily top-down driven (see, for exam- 
ple, Briscoe 2011), they would rightly deny that all instances of amodal 
completion would be mental imagery. But my concept of mental imagery is 
neutral about whether it is driven in a top-down manner, and, as I argued in 
Chapters 1 and 3, this reflects the use of this term in the empirical sciences. 
Further, one of the advantages of having a broader concept of mental imagery 
(understood as perceptual processing not directly triggered by sensory stimu- 
lation) is that we can make these distinctions within the broader category of 
mental imagery, which can help us to understand the diversity of amodal 
completion processes and the ways they interact. 

Finally, understanding amodal completion as a subspecies of mental 
imagery can also help us to better understand the function of mental imagery 
(Nanay forthcoming b). Most instances of what we normally mean by mental 
imagery are remarkably useless, so much so that one may wonder how this 
ability may have evolved. But amodal completion is very useful indeed: spot- 
ting the lion’s tail sticking out of the bush and amodally completing the rest of 
the lion is as important for survival as perceiving the lion. 


9 
Perception/Mental Imagery Mixed Cases 


We have a simple argument for the claim that perception is a mixture of 
sensory stimulation-driven perception and mental imagery. Almost every 
case of perception is a mixture of sensory stimulation-driven perception and 
amodal completion. And amodal completion is a form of mental imagery. 
Hence, almost every case of perception is a mixture of sensory stimulation- 
driven perception and mental imagery.’ 

The contribution of amodal completion to perception is not trivial. 
Perceptual states represent what they represent to a large extent as a result of 
amodal completion. When we see a cat behind the picket fence, it is the 
(partly occluded) cat that we see, not just those parts of the cat that are in 
plain sight. And if we see a landscape through a wire fence, we don't just see 
disjointed squares divided up by the wire—we see the landscape. Amodal 
completion is heavily involved in making our perceptual states what they are. 

One way of demonstrating the importance of the mental imagery compo- 
nent of everyday perception is to highlight the differences between the way 
perception in fact works and the way it would work without amodal completion. 

What would perception be without amodal completion—and, thus, with- 
out mental imagery? Here are two possibilities. According to the first one, the 
amodally completed features are represented, but not amodally. According to 
the second one, the amodally completed features are not represented at all. 
The first possibility is fairly thoroughly worked out—as it is the kind of vision 
Renaissance philosophers attributed to Christ and the “blessed” And it is a 
form of transparent vision. This is what Bartholomew Rimbertinus said about 
this “heavenly sight” in 1498: 


An intervening object does not impede the vision of the blessed... If Christ, 
even though himself in heaven after his Ascension, saw his dear Mother still 


1! The claim is that what we pre-theoretically take to be perception is a mixture of sensory 
stimulation-driven perception and mental imagery. When I refer to “perception/mental imagery 
mixed cases” in this chapter, the word “perception” in this phrase is to be understood as sensory 
stimulation-driven perception. So I talk about the mixture between sensory stimulation-driven percep- 
tion and mental imagery. 
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on Earth and at prayer in her chamber, clearly distance and the interposition 
of a wall does not hinder their vision. The same is true when an object’s face 
is turned away from the viewer so that an opaque body intervenes... Christ 
could see the face of his mother when she was prostrate on the ground...as 
if he were looking directly at her face. It is clear that the blessed can see the 
front of an object from the back, the face through the back of the head. 
(Bartholomew Rimbertinus, The Sensible Delights of Paradise 
(Venice, 1498), 17rcited in Baxandall 1972, p. 104, p. 172) 


As John Kulvicki memorably summarized, “Hide and seek is not one of 
Paradise’s delights” (Kulvicki 2009, p. 389). If perception without amodal 
completion is heavenly sight, then amodal completion—and a fortiori, mental 
imagery—clearly makes a significant contribution to what perception is. 

I considered the possibility that perception without amodal completion 
leads to a fully transparent visual world. But this is not the only option. Maybe 
perception without amodal completion is the exact opposite of a fully trans- 
parent visual world. So suppose now that instead of amodally completing 
them, we just completely failed to represent any occluded features. This would 
mean that we fail to represent the back side of three-dimensional objects and 
anything occluded behind anything else. This would amount to having a 
visual world where only those things are represented that we get sensory 
input from. Nothing is represented behind anything else. We could not per- 
form even the simplest visually guided action if this were the way we 
perceived. This would amount to something even less akin to perception than 
heavenly sight. 

A donut without a hole is a Danish and human vision without mental 
imagery is heavenly sight (or something even weirder: flat vision). And that is 
more different from human vision than a donut is from a Danish. Wouldn't 
this be enough to conclude then that perception depends constitutively 
on mental imagery? I don’t think it would because there are perceptual 
scenarios—rare ones, but nonetheless actual perceptual scenarios—where 
amodal completion plays no role at all, for example in the case of simple two- 
dimensional displays, like a red dot in the middle of a homogenous white 
background. 

As a result, I am making a weaker claim than the straight constitutive 
dependence claim: what we naively take to be perception is in fact a mixture 
of sensory stimulation-driven perception and mental imagery. And we have 
plenty of reasons to take mixed perception/mental imagery cases seriously. In 
fact, one could argue that an important desideratum for any account of 
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mental imagery (or of perception) is that it needs to be able to explain mixed 
cases of perception/mental imagery. 

Amodal completion is one example of perception/mental imagery mixed 
cases. There are others. To use the example from Chapter 3, when I am at the 
furniture store and I am looking at a sofa, imagining how it would look in the 
living room, I am looking at an actual object and attributing imagined prop- 
erties to it—properties it does not have, like being in my living room. And 
then I go home and look around in my living room imagining the sofa (which 
is not here, it’s still in the furniture store) in my egocentric space in the living 
room. I am attributing properties (“real” properties in some sense, for exam- 
ple spatial location properties) to an imagined object—to an object that is 
not there. 

These mixed perception/mental imagery cases are widespread. Neil Van 
Leeuwen has argued for their importance in understanding pretense (Van 
Leeuwen 2011; see also Schellenberg 2013 and Chapter 26 below on more 
about pretense and imagery/perception mixed cases). Robert Briscoe calls 
them “make-perceive” and examines the role they can play in the guidance of 
our actions (Briscoe 2008, 2018; see also Martin 2002, p. 410 for other exam- 
ples of the importance of mixed perception/mental imagery cases). If it is true 
that amodal completion involves mental imagery, then these hybrid states are 
more than some rare and odd episodes. If amodal completion involves mental 
imagery, then (virtually) all of our perceptual states are in fact mixed percep- 
tion/mental imagery states. 

And this should not come as a surprise. Let’s go back to peripheral vision, 
which also counts as mental imagery according to my definition. If the 
periphery is represented by means of mental imagery and the object in the 
fovea (let’s ignore amodal completion for a moment) is represented by means 
of sensory stimulation-driven perception, then there is clearly a spectrum as 
one proceeds from the fovea to the periphery—from perception (with very 
little or maybe no mental imagery involved) through mixed perception/men- 
tal imagery cases where there is more and more contribution from mental 
imagery and less and less from perception to mental imagery with very little 
perceptual contribution. So, in general, we should expect that most mental 
states we label as perception and many states we label as mental imagery are 
in fact mixed cases of the two. 

When I say that everyday perception is a mixture of mental imagery and 
sensory stimulation-driven perception, I don’t mean to suggest that this is a 
50/50 mixture. The contribution of mental imagery can be, and often is, lim- 
ited. But this does not diminish the importance of mental imagery when 
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understanding perception in general. We can't fully understand perception 
without understanding mental imagery. 

This is a good place to warn against a possible over-interpretation of these 
claims about how perception is a mixture of sensory stimulation-driven per- 
ception and mental imagery. One such over-interpretation would be to take 
anything that is not sensory stimulation-driven to be mental imagery. This is 
clearly not so. Mental imagery is perceptual processing that is not directly 
triggered by sensory input. So there will be lots of mental processes that are 
not sensory stimulation-driven that will still not count as mental imagery— 
those that do not count as perceptual processing. Further, there are many 
perceptual processes that are (in some sense) not fully sensory stimulation- 
driven, but they still fail to count as mental imagery because they are directly 
triggered by the sensory input. 

To illustrate this point, take hyperacuity. Downstream sensory discrimina- 
tion (including sensory discrimination in the primary visual cortex) is finer- 
grained than the packing of retinal cells could afford (Westheimer 1981). So 
in some cases there is higher spatial resolution in the primary visual cortex 
than in the retina. This phenomenon is called hyperacuity. Is it mental 
imagery? No. It is a classic example of early processing working with the sen- 
sory input and processing it further. There are no mediating perceptual repre- 
sentations that would trigger the representation of higher spatial resolution 
laterally. The early processing is triggered directly by the sensory input, 
although it is enriching the information that is there in the sensory stimula- 
tion. Hyperacuity is not mental imagery.” 

Things are a bit more complicated when it comes to perceptual constancies, 
but perceptual constancies provide an interesting contrast case to amodal 
completion, so I will spend some more time on this. As we have seen in 
Chapter 6, perceptual representations represent distal features of the environ- 
ment in spite of variations in the proximal input. The same color appears dif- 
ferent if the illumination conditions are different (and different colors appear 
the same if the illumination varies). Objects of the same size appear different 
if their distance from us is different. And objects of the same shape appear 


> One may wonder how hyperacuity differs from peripheral vision in this respect. The answer is 
that in the case of hyperacuity, the only information that is used to enhance the retinal resolution in 
early cortical processing is the information in the sensory input itself. No information is provided by 
perceptual representations laterally. In the case of peripheral vision, in contrast, the perceptual repre- 
sentation of the features in the periphery of the visual field is heavily influenced by representations 
both laterally (by the representation of surrounding features) and in a top-down manner (see the ref- 
erences on this in Chapter 5). 
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different if their orientation is different. And we know that at least some con- 
stancies are already present in the primary visual cortex. 

Size constancy is a well-researched example: if two objects project on our 
retina in a way that they take up equal areas of the retina, but are at different 
distances, then the one that is further away will occupy a larger part of the 
primary visual cortex (Murray et al. 2006—see also the findings that the 
strength of size constancy depends on the size of the V1 of the subject, 
Schwarzkopf et al. 2011). And lightness constancy and even color constancy 
are also present at very early stages of perceptual processing (V1 and V4 
respectively; see MacEvoy and Paradiso 2001; Bannert and Bartels 2017).? 

Does this mean that size and color constancy would amount to mental 
imagery? Definitely not. While both color constancy and size constancy are 
early perceptual processes, they are very much directly triggered by sensory 
stimulation—by the retinal input. They do not slavishly copy the input onto 
the early cortices, but they are nonetheless triggered by this input and trig- 
gered directly. 

Constancies provide a helpful contrast to amodal completion. Both percep- 
tual phenomena are very important for perception per se, both happen in the 
early sensory cortices and both go beyond what is presented in the sensory 
stimulation. But there is a major difference between them. In the case of 
amodal completion, the early cortical processing that happens where the illu- 
sory contour is completed is not directly triggered by sensory stimulation. 
There is no sensory stimulation that is even remotely close to the part of the 
visual field that is amodally completed. The perceptual processing of the 
amodally completed region is mediated by the representation of features in 
the not amodally completed regions. 

In the case of perceptual constancies, in contrast, the early cortical pro- 
cessing that is responsible for the constancies is very much directly triggered 
by sensory stimulation. It is not mediated by any perceptual representation 
laterally. When we look at a red square surrounded by a white background, 
the perceptual processing of the color of the square is influenced by the color 
of the area surrounding the square, but this perceptual processing is not 
triggered by the color of the surrounding area. It is triggered by the color of 
the square and modified by the color of the surrounding area. The processing 
of the color of the square comes first and it is then modified by the color 


° A related set of findings shows that size illusion and size adaptation illusion also happen in V1 
(Fang et al. 2008; Pooresmaeili et al. 2013). Also, the size of V1 predicts the strength of size illusions 
(like the Ebbinghaus and Ponzo effects; see Schwarzkopf et al. 2011). 
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information about the surrounding area. In the case of amodal completion, 
it’s the other way round: The retinal information about the contours in the 
area surrounding the amodally completed region cannot influence the pro- 
cessing of the features of the amodally completed region, because there is no 
processing of the features of the amodally completed region in the absence of 
the representation of the features of the surrounding area. 

What makes mental imagery special is that the sensory stimulation makes 
no direct causal contribution to the early perceptual processing. And this is 
indeed so in the case of amodal completion. But in the case of constancies, 
the sensory stimulation does make direct causal contribution. Perceptual 
constancy is not perceptual processing that is not directly triggered by sen- 
sory input. Perceptual constancy is not mental imagery. 

There is a famous slogan in machine vision, attributed (wrongly, it seems) 
to Max Clowes, one of the pioneers of artificial intelligence research: “vision is 
controlled hallucination” (see Clowes 1971 for similar (but not identical) 
claims). It would be tempting to summarize the main claim of this chapter by 
saying that perception is controlled mental imagery. This would not be that 
significant a deviation from the original Clowes dictum, especially given that 
hallucination on my account is a form of mental imagery. I think we should 
resist the temptation to make this strong claim. Almost all perception involves 
mental imagery: perceptual processes that are not directly triggered by sen- 
sory input. But it would be a much stronger claim to say that perception is 
mental imagery (controlled or not). According to my account, mental imagery 
is an important ingredient of perception, but not a necessary ingredient (this 
was Kant’s claim) and not the only ingredient (this would be Clowes’s claim). 
Sensory stimulation does much more than just control our mental imagery. 
The main claim of this chapter is much more modest. It is that perception, 
that is, what we pre-theoretically take to be perception, is in fact the combina- 
tion of mental imagery and sensory stimulation-driven perception. 


10 
Attention and Mental Imagery 


Close your eyes and visualize the bathroom in the house you grew up in. Now 
attend to the shape of the sink. And now shift your attention to the color of 
the sink’s taps. Got it? Your mental imagery changes as you shift your atten- 
tion around, in much the same way as your perceptual state changes as you 
shift your attention around. 

This procedure is often called “image inspection” and it plays an important 
role in some psychiatric treatments, as we shall see in Chapter 30 (see also 
Kosslyn et al. 2006). The main point is that image inspection is a matter of 
moving our attention around. Thus, some parts of the mental imagery are 
attended, some others are unattended. And as our attention moves around, 
different parts become attended. 

Mental imagery can be attended or unattended. It can be, as we have seen 
in Chapter 4, conscious or unconscious. And it can be determinate or inde- 
terminate. This chapter is about the ways these distinctions relate to each 
other in mental imagery in general, but I want to demonstrate the importance 
of these distinctions in the case of the form of mental imagery that Chapter 8 
was about: amodal completion. 

Amodal completion can be attended or unattended. I can shift my atten- 
tion from one perceived object to another and I can do the same when it 
comes to amodally completed parts of perceived objects. We can attend to 
some properties of amodally represented parts of perceived objects, but nor- 
mally we ignore most of these properties (see also De Weerd et al. 2006 for 
empirical support of this). 

And amodal completion, like mental imagery in general, may be conscious 
or unconscious. Given the sheer amount of amodal completion the visual sys- 
tem needs to do at any given moment, amodal completion is normally uncon- 
scious. When I see fifty cats behind the picket fence, I do not form conscious 
mental imagery of all occluded parts of all the fifty cats. But amodal comple- 
tion can be conscious if, for example, we are really interested in some of the 
occluded features. If for some reason I need to attend to the left eye of one of 
these fifty cats and it is occluded by the fence, I am likely to represent this left 
eye consciously. 
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The relation between these two distinctions (attended vs. unattended 
mental imagery and conscious vs. unconscious mental imagery) is further 
complicated by a third distinction, that of determinate versus indeterminate 
(or determinable) mental imagery. 

A bit of background: Being red is a determinate of being colored, but a 
determinable of being scarlet (Johnston 1921; Funkhouser 2006). There are 
many ways of being red and being scarlet is one of these: for something to be 
scarlet is for it to be red, in a specific way. If something is red, it also has to be 
of a certain specific shade of red. 

The determinable-determinate relation is a relative one: the same property, 
for example, of being red, can be a determinate of the determinable being 
colored, but a determinable of the determinate being scarlet. Thus, the 
determinable—determinate relation gives us hierarchical ordering of proper- 
ties in a given property space. Properties with no further determinates, if 
there are any, are known as super-determinates. 

We have seen in Chapter 3 that mental imagery may be determinate or deter- 
minable. One way of cashing out the sense in which mental imagery is less vivid 
than perception (an old insight from David Hume) would be to say that mental 
imagery attributes less-determinate properties than perception. But as we have 
seen, much of perception (for example, peripheral vision) attributes extremely 
determinable properties (see also Nanay 2018c, 2020e for the role of deter- 
minable properties in perception) and some instances of mental imagery can 
be very determinate indeed (for example, the mental imagery of hyperphan- 
tasics, but also the mental imagery of flashbacks to some traumatic events). 

And the same distinction also applies to amodal completion. Most of the 
time we attribute very determinable properties to amodally represented parts 
of perceived objects. This point is not independent from the previous one: see 
Yeshurun and Carrasco (1998) and Nanay (2010b) on the relation between 
attention and the attribution of determinate properties (roughly: attention 
increases the determinacy of attributed properties, see below). But if, for 
some reason, you are really interested in the amodally represented parts, you 
can attribute very determinate properties (determined at least partly in a top- 
down manner). If I see you with your hand in your pocket, I am unlikely to 
have a determinate representation of how you move your fingers in your 
pocket. But if I attend to this very thing, I may attribute more determinate 
properties to the whereabouts of each of your fingers (which, again, would be 
determined (at least partly) in a top-down manner). 


1 Given that attention may be conscious or unconscious (Cohen et al. 2012; see also Chapter 20 for 
further discussion and references), the attribution of more- or less-determinate properties can also 
happen consciously or unconsciously. 


ATTENTION AND MENTAL IMAGERY 71 


The way attention is exercised in perception and in mental imagery can 
help us to make progress in an important philosophical debate about the phe- 
nomenal similarity between seeing and visualizing. 

Look at a red apple. Now close your eyes and visualize this apple. Your per- 
ceptual state and your imagery of the apple are very similar in some respects. 
They are also different in some respects. Some of the oldest questions about 
mental imagery are about just how similar it is to, and how different it is from, 
perception. And about how we can explain this similarity (and difference). It 
should be clear that this question is about one specific subcategory of vision 
(conscious vision) and one specific subcategory of mental imagery (conscious 
and voluntary visualizing). But the lessons we can draw from this debate will 
be more general. 

A good starting point for the discussion of the similarity between mental 
imagery and perception is the Perky experiment. In this experiment, as we 
have seen in Chapter 5, subjects are looking at a white wall and they are asked 
to visualize objects while keeping their eyes open. Unbeknownst to them, 
barely visible images of the visualized objects are projected on the wall. The 
surprising finding is that the subjects take themselves to be visualizing the 
objects—while in fact they perceive them (Perky 1910; Segal and Nathan 
1964; Segal 1972). The standard interpretation of this experiment is that if 
perceiving and visualizing could be confused under these circumstances, then 
they must be phenomenally very similar (but see Hopkins 2012 for criticism 
and Nanay 2012a for a response; see also Craver-Lemley and Reeves 1992; 
Reeves and Craver-Lemley 2012; Gow 2019; Dijkstra et al. 2021). 

What explains this similarity between perception and mental imagery? An 
obvious answer to this question is that the phenomenology of these two men- 
tal states is similar because their content is similar (Ishiguro 1967; cf. Currie 
1995a, pp. 36-7; Kind 2001; Currie and Ravenscroft 2002, p. 27; Noordhof 
2002; Nanay 2015a). 

The relation between perceptual content and perceptual phenomenology 
has been an important issue in the philosophy of perception, and one influential 
view in this context is intentionalism, the view that perceptual phenomenology 
supervenes on perceptual content—that is, any difference in phenomenology 
is due to a difference in content. If the phenomenology of mental imagery 
also supervenes on the content of mental imagery, then the similarity of the 
phenomenology of mental imagery and of perception can be explained in 
a straightforward manner by the similarity of the content of these two 
mental states. 

Depending on how we think about perceptual content and the content 
of mental imagery, we get very different versions of this explanatory scheme. 
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One widespread way of thinking about perceptual content is in terms of prop- 
ositional content: perceptual states, just like beliefs and desires, are attitudes 
towards a proposition. But if we think of perceptual content and the content 
of mental imagery this way (see, for example, Byrne 2009; see also Currie 
1995a, pp. 36-7; Currie and Ravenscroft 2002, p. 27),” then it becomes less 
clear how the similarity of content would explain the similarity of phenome- 
nology. There are many propositional attitudes (beliefs, hopes, desires, etc.) 
that could share the same propositional content as perception and they do not 
seem to share the same phenomenology—at least not in the strong(er) sense 
we are trying to explain here.’ 

I myself have argued that the most promising way of cashing out the phe- 
nomenal similarity between perception and mental imagery in terms of content 
is to think about content as the attribution of properties to entities (Peacocke 
1986, 1989; Burge 2010; Nanay 2010a, 2013a). Perceptual states attribute prop- 
erties to the perceived entity (the sensory individual, see Chapter 6). 

Mental imagery attributes properties to the imagined entity. While the 
entities these properties are attributed to are very different (one is imagined, 
the other is not), the properties attributed to them (and, crucially, the way 
they are attributed) are similar. And this makes the two contents similar, 
which, in turn, makes the phenomenology of the two mental states also simi- 
lar (Nanay 2015a). 

In order to maintain the generality of this account of perceptual content, 
I will say nothing about whether these attributed properties are tropes or uni- 
versals (Nanay 2012b) or how this content is structured. The question I take 
to be crucial to explaining the similarities and dissimilarities between percep- 
tion and mental imagery has to do with the degree of determinacy that these 
perceptually attributed properties have. 

Some of the properties we perceptually attribute to the perceived scene are 
determinates or even super-determinates. Some others, on the other hand, 
are determinable properties. We know that our peripheral vision is only capa- 
ble of attributing extremely determinable properties. But even some of the 
properties we perceptually attribute to the objects that are in our fovea can be 
determinable. 


* It should be noted that Currie and Ravenscroft are somewhat ambiguous about whether they take 
perceptual content and the content of mental imagery to be propositional (see Nanay 2016b). 

* This is not meant to be a knock-down objection to the propositional attitude version of the simi- 
lar content view—the proponents of this view would have a lot more to say, for example, by appealing 
to the similarity of contents and attitudes. See Nanay (2015a, 2015b) for more detailed treatments of 
this topic. 
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Crucially, attention makes (or attempts to make) the attended property 
more determinate (see Yeshurun and Carasco 1998 for flagship empirical evi- 
dence; see also Nanay 2010a; Stazicker 2011 for philosophical summaries). If 
I am attending to the color of my office telephone, I attribute very determinate 
(arguably super-determinate) properties to it. If, as is more often the case, I 
am not attending to the color of my office telephone, I attribute only deter- 
minable properties to it (of, say, being light-colored or maybe just being 
colored). 

An important clarification: a shift of visual attention is not to be confused 
with eye movement. It is possible to shift one’s visual attention without any 
accompanying eye movement—this is a widely researched phenomenon of 
the “covert shift of attention” (Posner 1980, 1984; Posner et al. 1984; see also 
Findlay and Gilchrist 2003). But more often the shift of attention is accompa- 
nied by eye movement, which, following the literature, I call an “overt shift of 
attention.” Both in the case of overt and of covert shifts of attention, the deter- 
minacy of the perceptually represented property changes depending on the 
allocated attention. A perk of this way of thinking about attention and per- 
ceptual content is that perceptual attention comes out as a necessary feature 
of perceptual content—something empirical accounts of attention have long 
assumed (see Nanay 2010b, 2011b; Fazekas and Nanay 2021). 

Our mental imagery also attributes various properties to various parts of 
the imagined scene. The content of imagery is the sum total of the properties 
attributed to the imagined scene. Some of these properties are determinates 
or even super-determinates. Some others are determinables. Attention makes 
(or tries to make) the attended property more determinate. 

What then is the difference between perceptual content and the content of 
mental imagery? The main difference concerns where the extra determinacy 
comes from. As we have seen, both in the case of perceptual content and in 
the case of mental imagery, attention makes the attended property more 
determinate (see also Keogh and Pearson 2017 for similarities in terms of the 
limitation of attentional capacities in perception and mental imagery). This 
increase in determinacy in the case of perception comes from the sensory 
stimulation (for some more wrinkles, see Fazekas and Nanay 2021): if I am 
attending to the color of the curtain in the top-left window of the building in 
front of me, this color will be more determinate than it was when I was not 
attending to it. This difference in determinacy is provided by the world 
itself—I can just look: the exact shade of the curtain’s color is there in front of 
me to be seen. 
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In the case of mental imagery, this difference in determinacy, in contrast, is 
not provided by the sensory stimulation, for the simple reason that there is no 
direct causal link between sensory stimulation and mental imagery: if I visu- 
alize the house I grew up in and you ask me to tell you what exact color the 
curtain in the top-left window was, I can shift my attention to that color and I 
can even visualize the exact color of the curtain. However, this increase in 
determinacy is not provided by sensory stimulation (as I don't have any), but 
rather by my memories (or what I take to be my memories) or my beliefs or 
expectations. 

Clarifications: First, in the modified Perky experiments (Segal 1972), the 
picture projected on the wall and the image the subjects were asked to visu- 
alize were different, resulting in an interesting juxtaposition of the two 
images. In this case, it would be difficult to tell whether the subject perceives 
or has mental imagery—she does both (see Trehub 1991 for some further 
experiments involving mixed perception/mental imagery). The fact that 
according to my account the structure of the content of these two mental 
episodes is the same makes it easy to account for mixed cases like this (see 
Chapter 9 on such mixed perception/mental imagery cases). The increase in 
determinacy is provided by both the sensory stimulation and our memories/ 
beliefs in these cases. 

Second, my claim is not that attention makes the attended property more 
determinate, but that it makes or tries to make the attended property more 
determinate. It does not always succeed. And this is so both in the case of 
perceiving and in the case of visualizing. When I attend to something that I 
see in the periphery of my visual field and I cannot move my eyes, the shift of 
my attention tries to make the properties of this object more determinate, but 
because this object is, and continues to be, in the periphery of my visual field, 
I will not succeed (at least not as long as the object is far away enough from 
the fovea). The same goes for mental imagery. If I am asked to visualize my 
first credit card and attend to its color, I may just simply not remember and in 
this case, although attention tries to make the attributed property more deter- 
minate, it may not succeed. 

In short, the difference between perceptual content and the content of 
mental imagery is not a difference between the structure of these contents— 
they have the very same structure. The difference is between the dynamics of 
how the represented properties, and, importantly, the determinacy of the rep- 
resented properties, change in response to the allocation of attention. 

It is important to emphasize that the claim is not that the properties 
attributed in the content of mental imagery are less determinate than the ones 
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that are attributed in perceptual content. The properties that constitute the 
content of mental imagery can be very determinate indeed—and most of the 
properties that constitute perceptual content are not particularly determinate 
(see Dennett 1996). The claim is that the difference between the content of 
these two mental states is the way this determinacy comes about. 

We have seen that not everyone is on board when talking about the content 
of perceptual experiences. Some think that perceptual experiences just don't 
have content. Or even if they do, this content definitely does not explain the 
phenomenal character of this experience. These “relationalists” will not accept 
any version of the explanatory scheme I outlined here. 

However, as we have seen, there is little disagreement about the existence 
and importance of early cortical perceptual representations. We have also 
seen that these representations are present both in perception (that is, early 
cortical processing that is directly triggered by sensory input) and in mental 
imagery (that is, early cortical processing that is not directly triggered by sen- 
sory input). So a somewhat distant relative of the explanatory scheme I con- 
sidered above would be to explain the similarity of the phenomenal character 
of perception and of mental imagery with reference to the similarity of the 
content of these early cortical perceptual representations. 

Recall that the standard explanation of the phenomenal similarity between 
perception and mental imagery was in terms of the similarity of the content 
of perceptual states and of mental imagery. What I am proposing now is very 
different: the relevant content is not that of our overall perceptual state and 
our overall mental imagery. It is the content of all the subpersonal represen- 
tations of early cortical perceptual processing (see Martinez and Nanay 
forthcoming). Nonetheless, this is still an explanatory scheme that explains 
the similarity of phenomenology in terms of the similarity of content (that is, 
the content of early cortical representations). 

The phenomenology of mental imagery is similar to the phenomenology of 
perception because there is a similarity of content. But this similarity of con- 
tent can be cashed out in more detail if we consider the content of a variety of 
(conscious and unconscious) perceptual representations. The phenomenol- 
ogy of your mental imagery supervenes on the content of various (conscious 
and unconscious) perceptual representations. And the phenomenology of 
your perceptual state also supervenes on the content of various (conscious 
and unconscious) perceptual representations. And given that the content of 
these representations is very similar in the mental imagery and in the percep- 
tion case, this accounts for the phenomenal similarity between mental 
imagery and perception. 
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To sum up, attention and mental imagery are intertwined in various ways 
and they are both equally involved in many mental phenomena. For example, 
they can help us to explain not only how the content of mental imagery differs 
from perceptual content, but also perceptual expectations (see Judge and 
Nanay 2021; see also Chapter 12) and top-down influences on perception, a 
subject I now turn to. 


1] 


Top-Down Influences on Perception 


and Mental Imagery 


One influential debate about perception is about its purity: is perception an 
encapsulated process that is protected from any kind of top-down influence 
or is it influenced and modified by top-down information? 

What complicates this debate, often referred to as the cognitive penetrabil- 
ity debate, is that it is not at all clear what kind of mental state is supposed to 
be doing the penetrating and what kind of mental state is supposed to be pen- 
etrated. In other words, it is not clear what is “top” and what is “below” in the 
debate about top-down influences on perception. 

Once we clarify these conceptual issues, it seems that there is a wealth of 
empirical evidence in favor of the claim that there are indeed some top-down 
influences on perception. But then the question becomes: how does it hap- 
pen? What is the mechanism by which perception is influenced in such a top- 
down manner? 

And my answer is that if we accept the claim I argued for in Chapter 9 that 
the vast majority of perceptual states would be a hybrid of sensory stimulation- 
driven perception and mental imagery and if we add that mental imagery can 
be (at least partly) determined in a top-down manner, what we should expect 
is that there will be top-down influences on perception, mediated by mental 
imagery, given that mental imagery is a crucial ingredient in most instances 
of perceptual states. 

The main source of conceptual confusion concerning debates about top- 
down influences on perception is that it is not clear what is meant by “percep- 
tion” in this context. Some (especially philosophers, for example, Siegel 2011; 
Macpherson 2012; Stokes 2012; but also psychologists, for example, Firestone 
and Scholl 2014, 2016) take “perception” in this context to be perceptual 
experience: something we are consciously aware of. If we work with this con- 
cept of perception, then the question is whether top-down influences can 
alter the way we experience a scene—the phenomenal character of our expe- 
rience: what it is like to perceive this scene. 
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Another way of understanding what is meant by “perception” when we talk 
about top-down influences on perception is perceptual processing and espe- 
cially early perceptual processing—something neuroscientists (and also most 
psychologists) worry about. Here the question is whether processing in, say, 
the primary visual cortex is influenced in a top-down manner. 

These two questions are clearly very different—one of them is about phe- 
nomenology and the other is about early perceptual processing. And as 
changes in early perceptual processing are neither necessary nor sufficient for 
changes in perceptual phenomenology, there is no easy traffic between these 
two different sub-debates. 

I am extremely pessimistic about whether the first of these debates could 
ever be resolved in a satisfactory manner. One crucial difference between the 
two positions in this debate is about what is part of our perceptual (as opposed 
to non-perceptual) phenomenology. Those who argue for the existence of 
top-down influences on perception (again, understood here as perceptual 
experience) need to show that there can be two experiences, call them E, and 
E, that only differ in that there is a top-down influence in E,, which is missing 
in E, and that the two differ in their perceptual phenomenology. So the top- 
down influence results in a difference in perceptual phenomenology. Those 
who are against the idea of top-down influences on perception (again, on 
perceptual experiences) can acknowledge that E, and E, differ only in that 
top-down influences are present in E, but absent in E, and they can also 
acknowledge that E, and E, differ in their non-perceptual phenomenology— 
they only need to deny that they differ in their perceptual phenomenology. So 
the only way of adjudicating between the proponents and the opponents of 
top-down influences on perceptual experience is by having a very clear dis- 
tinction between perceptual and non-perceptual phenomenology (Kriegel 
2007; Siegel 2007; Bayne 2009; Masrour 2011; Nanay 201 1a, 2012c, 2013a). 

But we are blatantly missing any such very clear distinction. Take the fol- 
lowing example: You're at a dinner party and you're eating what you take to be 
chicken. Then your host tells you that it is in fact rat meat. Your experience, 
presumably, changes. The meat tastes different. This may seem to be an indi- 
cation that your perceptual phenomenology changes—what changes is the 
way the meat tastes to you. But it might also be that what changed was instead 
not the perceptual but the non-perceptual phenomenology in this example 
(that is, the taste itself is strictly speaking the same, but you somehow frame it 
differently). It is difficult to see what could possibly settle this disagreement. 
We may be able to tell whether our overall phenomenology changed. But to 
tell whether this phenomenal change was perceptual or non-perceptual is 
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much more difficult. In other words, if I say that the two experiences differ in 
their perceptual phenomenology and you deny this, it is not clear how the 
issue can be decided. Intuitions wildly differ with regards to what phenome- 
nal character counts as perceptual. 

And this makes the debate about whether there are top-down influences on 
perceptual experiences a very odd debate. Given that it is not clear what per- 
ceptual phenomenology is and how to keep it apart, introspectively, from 
non-perceptual phenomenology, the question about whether perceptual phe- 
nomenology depends on top-down influences relies on this unclear concept 
of perceptual phenomenology (which is difficult to keep apart from non- 
perceptual phenomenology by introspective means). It is unclear then how 
we can make any progress in answering this question—if we take the question 
about top-down influences to be about perceptual phenomenology.’ 

So, as a result, I take it that the more interesting (and more clearly 
substantive) debate about top-down influences on perception is about 
whether early perceptual processing is influenced in a top-down manner. 
This is what I take to be the question of “top-down influences on percep- 
tion” in what follows. 

Observant readers may have noticed that I have talked about “top-down 
influences on perception” and not about cognitive penetration so far. There is 
a reason for this. The term “cognitive penetration” suggests that whatever is 
doing the penetration is a cognitive state and this is not something I want to 
be built into the very notion I am analyzing. 

When I talk about “top-down” influences on perception, I want to allow for 
any “top-down” influence—not just those that are labeled “cognitive” And it 
is not very clear why the label “cognitive” is singled out. “Cognitive” can mean 
many things. It is sometimes contrasted with “affective,” but this is clearly not 
something we want to do if we are interested in top-down influences on per- 
ception as there may be affective influences on perception and they may be as 
important as (or more important than) non-affective cognitive influences 
(Schupp et al. 2004; Pessoa and Ungerleider 2005; Schmitz et al. 2009). The 
term “cognitive” is also often contrasted with “conative; but this is not a useful 
usage in the present context either as there may be very good reasons to posit 
top-down influences on perception where it is a desire or an intention that 
influences our perceptual processing (Nanay 2006; Stokes 2012). 


1 As we have seen in Chapter 3, not everyone agrees that there is such a thing as non-perceptual 
phenomenology. This disagreement makes the debate about top-down influences on perceptual phe- 
nomenology even less straightforward. 
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Of course, the most straightforward use of “cognitive” may just be one 
where it is contrasted with “perceptual,” but this oversimplifies things consid- 
erably. In fact, one reason why it is better to focus on the debate about whether 
there are top-down influences on early perceptual processing than on the one 
about whether there are top-down influences on perceptual phenomenology 
is that if we focus on the latter debate, the only kind of top-down influence we 
can talk about is from non-perceptual mental states (typically beliefs) to men- 
tal states with perceptual phenomenology (that is, perceptual experiences). 
But we have seen that addressing any questions about the presence or absence 
of such top-down influences then requires a very clear distinction between 
perceptual and non-perceptual phenomenology and we don’t have any such 
distinction. 

If, on the other hand, we consider the debate about whether there are top- 
down influences on early perceptual processing, we get a more nuanced pic- 
ture. The question of top-down influences is no longer a yes or no question, as 
in the case of the phenomenology interpretation (either there is cognitive 
penetration or there isn’t), but a multifaceted one. Maybe the primary visual 
cortex is influenced in a top-down manner by V2 and V4, but not by our 
expectations and beliefs. Or maybe it is only influenced by V2. Or maybe also 
by our expectations and beliefs. All of these claims would assert top-down 
influences on early perceptual processing (of which we have very strong 
evidence; see, for example, Gandhi et al. 1999; Murray et al. 2002; O'Connor 
et al. 2002; Douglas and Martin 2007; Muckli 2010), but it matters a lot what 
kind of top-down influences they are (see Lupyan et al. 2010; Block 2014; 
Vetter and Newen 2014; see also Teufel and Nanay 2017 for a detailed analysis 
of various kinds of top-down influences on early cortical perceptual processing 
and the differences between those top-down influences that come from within 
the visual system and those that come from post-perceptual processing). 

How can we be sure that we have discovered a top-down influence on per- 
ception? Suppose that you vary the state of a higher-level mental process and 
this leads to changes in the primary visual cortex. Does this count as evidence 
for top-down influences on perception? 

Clearly not. If I turn my head to the left, this will influence the state of my 
primary visual cortex: the information that gets processed in my primary 
visual cortex will be very different. Or if I close my eyes, this will have an even 
more obvious influence on my primary visual cortex. So we need an addi- 
tional condition: namely, that the sensory stimulation needs to remain con- 
stant. If the sensory stimulation (say, the retinal image) is constant, and a 
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change in a higher-level mental process causes a change in a lower-level 
perceptual process, we have evidence for top-down influences on perception.” 

This seemingly not very substantive clarification is in fact very important 
when it comes to attention. Attentional effects are often ruled out of the dis- 
cussion of top-down influences on perception (Pylyshyn 1999; Siegel 2011). 
And, at least on some ways of understanding attention, rightly so. 

Overt shifts of attention are ones that are accompanied by eye movements. 
So if the primary visual cortex changes as a result of difference in overt atten- 
tion, this does not count as a top-down influence because overt attention 
changes the sensory stimulation and the primary visual cortex changes as a 
result of this change in sensory stimulation. So we are right to rule out overt 
attention as a top-down influence on perception (that is, on early perceptual 
processing). 

Covert shifts of attention, as we have seen in Chapter 10, are ones that are 
not accompanied by eye movements (Posner 1980, 1984; Posner et al. 1984; 
see also Findlay and Gilchrist, 2003). So we can keep the sensory stimulation 
fixed and change the covert attention and if this leads to changes in the pri- 
mary visual cortex, this would indeed count as a top-down influence on per- 
ception (Mole 2015; Wu 2017; Stokes 2018). In short, while overt attention 
should not be considered to be a source of potential top-down influence on 
perception, covert attention should be taken to be one of the prime examples 
of such a top-down influence. 

Unsurprisingly, in the context of the book, I want to focus on a different 
kind of mental phenomenon and the role it plays in top-down influences on 
perception (without dismissing the importance of covert attention), namely, 
mental imagery. 

As we have seen in Chapter 3, some ways of exercising mental imagery are 
subject to top-down influences: the apple I visualize will look different 
depending on my template for what apples look like—which, in turn, pre- 
sumably depends on what kinds of apples I have encountered in my life. If all 
these apples were red, I am likely to visualize a red apple. We have also seen in 
Chapter 8 that amodal completion can also be subject to top-down influences. 
But then if, as I argued, many of the properties of perceived objects are really 
represented by mental imagery, and the mental imagery that is involved in 


> Note that this characterization does not cover adaptation effects or perceptual learning, where the 
sensory stimulation is constant but the change in lower-level perceptual processing is not caused by 
higher-level mental processes. 
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this is, in turn, influenced in a top-down manner, what we should expect is 
that perception is very much subject to top-down influences. 

In other words, the importance of mental imagery in everyday perception 
gives us very strong reasons to allow for top-down influences on perception. 
We have seen that what we pre-theoretically take to be perception is in fact a 
mixture of perception and mental imagery. And at least some of the mental 
imagery in the mix can be (but doesn’t have to be) subject to top-down influ- 
ences. But then it follows that most perceptual states can also be subject to 
top-down influences. 

It is important not to overestimate the scope of this claim: I am not claim- 
ing that all perceptual states are influenced in a top-down manner. I’m not 
even saying that all perceptual states where amodal completion plays any role 
are influenced in a top-down manner. We have seen in Chapter 8 that while 
mental imagery plays a role in most perceptual states, there are exceptions— 
for example simple two-dimensional displays, where the perceptual state is 
fully determined by sensory stimulation-driven perception. 

Further, there are many forms of mental imagery that do not depend on 
top-down information. The kind of mental imagery our perceptual system 
uses to fill in the blind spot is one clear example. The perceptual processing of 
information that would correspond to the blind spot is not directly triggered 
by sensory input in the visual sense modality because there is no sensory 
input that would directly trigger such perceptual processing: the blindspot 
has no receptors. But this perceptual processing is determined laterally by 
perceptual processing of the sensory information coming from those parts of 
the retina that surround the blind spot. No top-down influence is needed and 
we have no evidence that there are any top-down influences on this form of 
mental imagery. 

The picture we ended up with is one where perceptual processes consist of 
a sensory stimulation-driven and a non-sensory stimulation-driven compo- 
nent (where by sensory stimulation-driven, I mean directly driven by sensory 
input of the relevant sense modality). In other words, perception consists in 
mental imagery and stimulation-driven perception. And mental imagery 
influences the way the stimulation gets processed. In some very rare examples 
of simple two-dimensional visual displays, the mental imagery component 
may be missing. But in the vast majority of perceptual scenarios, it is present 
and it gets combined with sensory stimulus-driven perceptual processing. 
And in these cases, much of what we take ourselves to perceive we really 
partly represent by means of mental imagery. And as at least some of these 
episodes of mental imagery are subject to top-down influences, perception 
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per se can also be subject to top-down influences (see also Dijkstra et al. 
2017b for some empirical support). 

We have seen that one advantage of talking about top-down influences on 
perceptual processing over talking about top-down influences on perceptual 
experience is that it allows for a more nuanced picture of what is the “top” in 
these top-down influences. So when I argue that many instances of mental 
imagery can be subject to top-down influences and, as a result, many instances 
of perception can also be subject to top-down influences, I should say some- 
thing about what I take to be the “top” in these top-down influences—what 
these higher-order mental states are that influence mental imagery and by 
doing so also influence perception. 

And one notorious problem in arguing about top-down influences is that 
the behavioral or neuroimaging data very often underspecify where the infor- 
mation that is used to enrich or specify the bottom-up processing comes 
from. Take the example of Figure 8 in Chapter 8 on p. 61, which I used as 
evidence for top-down influences on amodal completion. My explanation 
was that the fact that we have early perceptual processing of the illusory con- 
tours is explained by top-down influences on this instance of visual mental 
imagery. But someone could object that the information about the horse's 
illusory contours is not coded anywhere “top? It is coded, the objection would 
go, in the perceptual system itself. It is true that this information is learned, 
but it was learned by means of perceptual learning. So our amodal comple- 
tion is not at all influenced in a top-down manner in this example—it is influ- 
enced, laterally, by information present in the perceptual system. 

I think this is a valid move against a large number of claims about top- 
down influences on perceptual phenomenology. But it is not a valid move 
against claims about top-down influences on early perceptual processing. 
While it is very difficult to tell whether the information about the illusory 
contours of the horse is something that is coded in the perceptual system, 
what we can say with great certainty is that this piece of information is not 
coded in the primary sensory cortex. So if this information (about the illu- 
sory contours of the horse shape) influences the way the primary visual cortex 
functions (as is the case when it comes to amodal completion), we do have 
good reason to conclude that this influence is indeed top-down: it comes 
from somewhere further up from the primary visual cortex and it influences 
the primary visual cortex. A genuine top-down influence. Just how far up this 
information comes from we can leave open. In the case of the Kanizsa triangle, 
we have evidence that the completion of the illusory contours in the primary 
visual cortex is subject to top-down influences from the secondary visual 
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cortex (Qiu and von der Heydt 2005). This still counts as a top-down influence 
on early perceptual processes. 

Some champions of cognitive penetration (with emphasis on “cognitive”) 
would, no doubt, be disappointed with this version of top-down influences on 
perception as nothing I have said here shows that very high-level mental 
states, such as explicit beliefs would have top-down influences on early per- 
ceptual processing. We have seen that mental imagery can be more or less 
top-down driven. But then it is possible that when it is more top-down driven, 
it influences relatively late stages of perceptual processing, whereas when it is 
less top-down driven, it influences relatively early stages of perceptual pro- 
cessing. Nothing I have said here rules out this possibility.” 


* This is an important difference between my account of the relation between mental imagery and 
“cognitive penetration” and Fiona Macpherson’s. Macpherson argues that the cognitive penetration of 
(color) perception is always mediated by “non-perceptual states with phenomenology” (Macpherson 
2012, pp. 49-58). This category of “non-perceptual states with phenomenology,’ for Macpherson, 
encompasses imagination, dreaming, and hallucination (Macpherson 2012, p. 50)—in this sense it 
might be thought to be similar to what I mean by mental imagery (which also encompasses these 
three mental processes). But for Macpherson these states are necessarily conscious and mediate cogni- 
tive penetration by virtue of their phenomenal character (Macpherson 2012, pp. 51-3)—which is very 
much against the spirit of my own proposal. 


12 
Temporal Mental Imagery 


I defined mental imagery as perceptual representation that is not triggered 
directly by sensory input. If there is no sensory input and the early cortical 
representation still happens, this is mental imagery. If there is sensory input, 
but it does not directly trigger this early cortical representation, this still 
counts as mental imagery. 

How can we apply this definition to the temporal case? Suppose that you 
have a sensory input, say a triangle in the middle of your retina, which then 
directly triggers a perceptual representation, say a V1 representation of an 
isomorphic triangle in the middle of your visual field. This is not mental 
imagery. We know a fair amount about the time frame of this activation. Visual 
sensory stimulation reliably leads to V1 activation in 30 milliseconds (see 
Rolls and Tovee 1994; Thorpe et al. 1996; Rauschenberger et al. 2006 for sum- 
maries). So if you have visual input at time T, and then at T, + 30 milliseconds, 
you have V1 activation, this is sensory stimulation-driven perception. 

In contrast, if you have the same sensory input (of the triangle) at T,, but 
the tokening of the perceptual representation happens either earlier or (sig- 
nificantly) later than T, + 30 milliseconds, then this would count as temporal 
mental imagery, because this perceptual representation is not triggered 
directly by the sensory input. If it happens earlier than T, + 30 milliseconds, 
then it is temporal mental imagery because the perceptual representation is 
not triggered by the sensory input (as causation has a temporal dimension). 
And if the tokening of the perceptual representation happens (significantly) 
later than T, + 30 milliseconds, then it is temporal mental imagery because it 
is triggered by the sensory input indirectly, that is, by the mediation of another 
representation (which, in turn was tokened at T, + 30 milliseconds). 

A helpful shortcut to assessing whether perceptual processing is sensory 
stimulation-driven perception or temporal mental imagery would be to 
appeal to the concept of temporal correspondence. If the perceptual represen- 
tation is triggered by temporally corresponding sensory input (where “tem- 
porally corresponding” means “being preceded by 30 milliseconds”), it is 
sensory stimulation-driven perception. If this temporal correspondence is 
missing, it is temporal mental imagery. We have seen in Chapter 1 that spatial 
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correspondence is a good guide to assessing the directness of the causal link 
between sensory input and perceptual processing, at least when it comes to 
some properties like shapes. In this chapter, for the sake of the simplicity of 
exposition, I will use a similar maneuver involving temporal correspondence. 

If we have V1 activation but no visual sensory stimulation that would have 
preceded this V1 activation by 30 milliseconds, then there is no temporal cor- 
respondence. The perceptual processing (in V1) is not triggered by tempo- 
rally corresponding sensory stimulation. We have an instance of temporal 
mental imagery. Temporal mental imagery is perceptual processing that is 
triggered by spatially corresponding sensory stimulation in the appropriate 
sensory modality, but where this perceptual processing does not temporally 
correspond with the incoming stimulation. 

In other words, even if there is spatial correspondence between the sensory 
input and perceptual processing, if the temporal correspondence is missing, 
we have a form of mental imagery: temporal mental imagery. For instance, in 
the visual case, if the perceptual processes retinotopically correspond to the 
sensory stimulation but fail to correspond to the timing of the sensory stimu- 
lation, we have temporal mental imagery. 

Temporal correspondence can fail in two directions: the perceptual pro- 
cessing may come earlier than it should—this is a case of “predictive temporal 
mental imagery.” Or it may come later than it should—this would amount to 
“postdictive temporal mental imagery” (see Viera and Nanay 2020 for more 
on both predictive and postdictive temporal mental imagery)." 

One important advantage of this way of thinking about temporal mental 
imagery is that it can help us to explain a recurring theme in thinking about 
the experience of time. This was summarized memorably by William James, 
who writes: “the practically cognized present is no knife-edge, but a saddle- 
back, with a certain breadth of its own” (James 1890, p. 609). 

In other words, our experiences have a certain temporal thickness. But 
what does this mean exactly? Here is a more contemporary philosophical spin 
of what James had in mind: “The dynamic content of our experience at short 
timescales is metaphysically dependent on the content of experience over 
longer timescales” (Phillips 2011b, p. 808). 


1 A lot has been said in the philosophy of perception about the case of seeing a long-extinguished 
star. We see it now, but the star no longer exists. It is important to stress that this would not count as 
an instance of temporal mental imagery. While there is a time delay in both the case of seeing a distant 
star and in (postdictive) temporal mental imagery, in the latter case, this time delay is between the 
retina and the early cortical representations, whereas in the former case, the time delay is between the 
star and the retina. Seeing a long-extinguished star is not temporal mental imagery. 
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Figure 10 The “saddle-back” of temporal perception 


So when we have an experience of, say, watching a football fly through the air 
and bounce off the goalpost, our experience should not be characterized as 
the sequence of dimensionless point-like experiences. Rather, our experience 
of the ball right now somehow represents the ball a split second ago and also 
represents where the ball would be in a split second. This phenomenon is 
often described, following James, as the “specious present.” 

Here is the saddleback William James talks about (Figure 10—note that the 
two flanks of the bell-shape are not necessarily symmetrical). The middle of 
the saddle would be the present moment. But we somehow represent the two 
flanks of this bell-shape as well. The question is: how? 

This raises some deep issues about the nature of perception. How is it pos- 
sible to perceive something that is not present? The ball a split second ago is 
no longer present. And the ball in a split second is not present yet. According 
to an influential line of thought in philosophy, we can only perceive what is 
there to be perceived (for example, Grice 1961). But when it comes to time, 
only the present is present. So the past, let alone the future can’t be perceived. 

There are various sophisticated ways of dealing with this problem (for 
example, extending the temporal dimension of not just the content, but also 
the vehicle of perceptual representations, see Phillips 2011b. But if we take the 
concept of temporal mental imagery seriously, then there is no need to com- 
plicate things unnecessarily. 

We represent the flanks of the bell-shape by means of temporal mental 
imagery. We know that the early cortical processing of a temporal event has a 
much wider temporal profile than the retinal event. So some of this percep- 
tual processing will be triggered by corresponding sensory stimulation (the 
middle), but most of it will not be. It will count as temporal mental imagery, 
where the early cortical processing is not triggered by temporally correspond- 
ing sensory stimulation. It is triggered by sensory stimulation that is either 
too early or too late. 

The further away we veer from sensory stimulation-driven perceptual 
processing (the middle of the saddle shape), the bigger role temporal mental 
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imagery will play. This could be thought of as the temporal equivalent of the 
claims I made in Chapter 3 about peripheral vision. In the case of peripheral 
vision, the part of the scene that is in the fovea is represented by means of 
sensory stimulation-driven perception, but the further away a part of the 
scene falls from the fovea, the more significant role mental imagery has in 
representing it. Similarly, in the case of temporal mental imagery: the present 
is represented by means of sensory stimulation-driven perception, but the 
further we depart from the present (either in the direction of the future or in 
the direction of the past), the more significant role temporal mental imagery 
plays. The specious present is a hybrid perception/mental imagery state. 

This way of thinking about the specious present can help us to make prog- 
ress in the debate about the perception of time. Suppose that you experience 
an event with two temporal parts, A and B, where B follows A. What can we 
say about our experience of these two temporal parts? 

There are two dominant philosophical theories that aim to answer this 
question, extensionalism and retentionalism. According to extensionalism, 
the experience of this event consisting of two temporal parts A and B is 
decomposable into the experience of A and the experience of B. And the tem- 
poral extension of the vehicle matches the temporal extension of the content 
(Dainton 2008; Hoerl 2009; Phillips 2011b). Roughly, first you have vehicle 
(A) with content (A) and then you have vehicle (B) with content (B). 

The alternative, retentionalist view denies both of these claims: the experi- 
ence of the event consisting of two temporal parts A and B is not decompos- 
able into the experience of A and the experience of B. And the temporal 
extension of the vehicle does not need to match the temporal extension of the 
content (Broad 1923; Lee 2014). 

Without addressing the strengths and weaknesses of these two alternative 
theories, it needs to be pointed out that my view could be thought of as a mix 
and match between these two established views. Just like the extensionalist, 
but unlike the retentionalist, I claim that the experience of the event consist- 
ing of two temporal parts A and B is decomposable into the experience of A 
and the experience of B. But like the retentionalist, and unlike the extension- 
alist, I claim that the temporal extension of the vehicle does not need to match 
the temporal extension of the content. 

I have been using the term “experience” in the last couple of paragraphs 
because the proponents of extensionalism and retentionalism are mainly 
interested in conscious perception of time, or temporal experience, but as 
with all perception, time perception may also be conscious or unconscious, so 
I will rephrase the two claims as follows: the representation of the event con- 
sisting of two temporal parts A and B is decomposable into the representation 
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of A and the representation of B. And the temporal extension of the vehicle of 
these representations does not need to match the temporal extension of the 
content. 

The first claim was that the representation of the event consisting of two tem- 
poral parts A and B is decomposable into the representation of A and the repre- 
sentation of B. The main novelty of my account, in comparison with 
extensionalism, is that one of these representations is mental imagery. When A is 
happening, it is represented by means of sensory stimulation-driven perception, 
whereas B is represented by temporal mental imagery. And when B is happen- 
ing, it’s the other way round: B is represented by means of sensory stimulation- 
driven perception, whereas A is represented by temporal mental imagery. 

The second claim was that the temporal extension of the vehicle of these 
representations does not need to match the temporal extension of the con- 
tent. The extensionalist move of spreading out not just the content, but also 
the vehicle of temporal representation (or, as they would say, temporal experi- 
ence) was motivated by the consideration outlined above, namely that per- 
ception cant represent something that is not (temporally) present. So when A 
is happening, we can't perceptually represent B (which hasn't happened yet). 
And when B is happening, we can't perceptually represent A (which has 
already happened). The extensionalist solution to this problem is to posit a 
series of perceptual states that each represent the temporal parts of the event. 
This solution comes with a price (see Lee 2014; see also the classic objections 
in Dennett 1991). 

But if we take the role of temporal mental imagery in time perception seri- 
ously, we do not need to posit temporally extended vehicles (with all the 
problems this entails). While perception may or may not represent only the 
present, mental imagery can clearly represent something that is not present. 
Importantly, temporal mental imagery can represent something that has just 
happened or that is about to happen. That is the reason why my account does 
not require that the temporal extension of the vehicle of temporal representa- 
tions match the temporal extension of their content. Some proponents of 
retentionalism insist (as a critical point against extensionalism) that our prior 
sensory responses to the world must leave behind traces in the current state 
of the perceptual system that can be usefully integrated with the current 
incoming sensory signals (Lee 2014). My account identifies what these left- 
behind traces actually are: they would amount to temporal mental imagery. 

Temporal mental imagery can represent the past or the future. I will go 
through a number of examples where it represents the past in Chapters 20 
and 21. But I want to give a taste of what it amounts to when temporal mental 
imagery represents the future (see also Moulton and Kosslyn 2011). 
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We have seen that prior expectations of a specific stimulus evoke a feature- 
specific pattern of activity in V1, similar to that evoked by corresponding 
actual stimulus (Kok et al. 2014, 2017). The finding I want to focus on is about 
how the beginning of a familiar sequence triggers a time-compressed wave of 
activity of the rest of the familiar sequence in V1 (Ekman et al. 2017). 

In this study, the experimenters familiarized subjects with a particular dot 
sequence (a dot moving from the top-left of a monitor to the top-right) and 
using high-speed fMRI they were able to successfully measure activation in 
V1 for retinotopic locations corresponding with the incoming sensory stimu- 
lation. In this way, they were able to map out the trajectory of the dot sequence 
in V1. They then scanned the visual cortices of subjects in two distinct condi- 
tions. In the control sequence they presented subjects with an initial display 
in which the dot was located at the end-location of the familiarization 
sequence (that is, the dot was shown in the top-right). And there was activa- 
tion in V1 in the location of the presented stimulus. In the preplay condition, 
in contrast, subjects were presented with an initial display in which the dot 
was located at the start location of the familiarization sequence (that is, the 
dot was shown in top-left). Interestingly, in this condition subjects showed V1 
activation that amounted to a time-compressed trajectory of the entire famil- 
iarization sequence. That is, V1 encoded a dot moving from the top-left to the 
top-right of the display. The response was time-compressed in the sense that 
the cortical processes traced out the expected trajectory of the dot more 
quickly than they would if they were responding to the actual dot sequence. 
Furthermore, it was shown that when this cortical pre-play was elicited by the 
initial dot display, the subsequent detection performance for the location of 
the dot along that trajectory was enhanced. 

In this case, we have predictive (or anticipatory) temporal mental imagery in 
that we have early cortical perceptual processes that do not temporally corre- 
spond to the relevant sensory stimulation. The V1 activity occurs prior to the 
relevant sensory stimulation (see also De Lange et al. 2018 on how expectations 
modify early cortical processing, both before stimulus onset and after; see also 
Diekhof et al. 2011, who call this “anticipatory imagery”). Here the temporal 
mental imagery represents what is about to happen. Chapters 20 and 21 are 
about cases where the temporal mental imagery represents what has happened.’ 


> An interesting case where forward-looking and backward-looking temporal mental imagery 
combines is sequence memory (see Allen et al. 2014 for a summary). When remembering a tune, for 
example, what we remember is which notes will follow which ones: we do not remember the entire 
sequence when we remember the first note. Rather, at each point of the recall of the sequence, we only 
remember the next couple of notes in the sequence. This would amount to the backward-looking 
temporal imagery of a forward-looking temporal imagery sequence. 


PART III 
MULTIMODAL PERCEPTION 


13 
Multimodal Mental Imagery 


There is a lot of recent evidence that multimodal perception is the norm and 
not the exception—our sense modalities interact in a variety of ways (see 
Sekuler et al. 1997, Vroomen et al. 2001; Bertelson and de Gelder 2004; 
Spence and Driver 2004 for summaries; and O’Callaghan 2008a, 2011 as well 
as Macpherson 2011 for philosophical overviews). Information in one sense 
modality can influence and even initiate information processing in another 
sense modality at a very early stage of perceptual processing (even in the pri- 
mary visual cortex in the case of vision, for example; see Watkins et al. 2006). 

A simple example is ventriloquism, which is an illusory auditory experi- 
ence influenced by something visible (Bertelson 1999; O'Callaghan 2008b). It 
is one of the paradigmatic cases of crossmodal illusion: We experience the 
voices as coming from the dummy, while they in fact come from the ventrilo- 
quist. The auditory sense modality identifies the ventriloquist as the source of 
the voices, while the visual sense modality identifies the dummy. And, as it 
often (not always—see O’Callaghan 2008b) happens in crossmodal illusions, 
the visual sense modality wins out: our (auditory) experience is of the voices 
coming from the dummy. 

More generally, early cortical processing in one sense modality can be trig- 
gered, in the absence of sensory stimulation in this sense modality, by cross- 
modal influences from another sense modality (Calvert et al. 1997; Zangaladze 
et al. 1999; James et al. 2002; Pekkola et al. 2005; Ghazanfar and Schroeder 
2006; Mast et al. 2006; Martuzzi et al. 2007; Hertrich et al. 2011; Kilintari et al. 
2011; Hirst et al. 2012; Iurilli et al. 2012; Muckli and Petro 2013; Chan et al. 2014; 
Vetter et al. 2014). These are bona fide examples of perceptual processing that 
is not directly triggered by sensory input in this sense modality. 

When I am looking at my coffee machine that makes funny noises, I per- 
ceive this event by means of both vision and audition. But very often we only 
receive sensory stimulation from a multisensory event by means of one sense 
modality. If I hear the noisy coffee machine in the next room, that is, without 
seeing it, then the question arises: how do I represent the visual aspects of this 
multisensory event? 
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I argue that in cases like this one, we have multimodal mental imagery: 
perceptual processing in one sense modality (here: vision) that is triggered by 
sensory stimulation in another sense modality (here: audition). Multimodal 
mental imagery is neither a rare nor an obscure phenomenon (see Nanay 
2018a for an overview). The vast majority of what we perceive are multisen- 
sory events: events that can be perceived in more than one sense modality— 
like the noisy coffee machine. In fact, there are very few perceived events 
that are not multisensory in this sense. And most of the time we are only 
acquainted with these multisensory events via a subset of the sense modalities 
involved—all the other aspects of these multisensory events are represented 
by means of multimodal mental imagery. This means that multimodal mental 
imagery is a crucial element of almost all instances of everyday perception. 

More slowly: Most of what we perceive, we perceive with more than one 
sense modality. I call sensory individuals that can be perceived with more 
than one sense modality multisensory individuals.’ 

As we have seen in Chapter 6, different sense modalities may have different 
sensory individuals. In the case of vision, the debate is about whether the sen- 
sory individuals of vision are ordinary objects or spatiotemporal regions. And 
in the case of audition, the debate is between those who take sensory individ- 
uals to be sounds and those who take them to be ordinary objects (or maybe 
events). Similarly for olfaction: odors or ordinary objects? And it is not clear 
what exactly “ordinary objects” are supposed to be either. To make things 
even more complicated, we perceive some individuals both by vision and by 
audition (and maybe also by olfaction). 

So, for simplicity, I will just say that what we perceive are individuals. Just 
what kind of individuals they are, I want to leave open. Events are individuals 
and so are entities. What interests me here is what happens if we perceive 
individuals that can be perceived with more than one sense modality. I call 
these sensory individuals multisensory individuals. Note the difference between 
the terms “multisensory” and “multimodal, as I use them. Multisensory merely 
means something we can perceive by two or more sense modalities. 
Multimodality will have to do with the interaction of these senses—something 
not at all presupposed by the concept of multisensory. Further, when I talk 
about perception in this context, what I mean is perception that is not neces- 
sarily conscious. So, multisensory individuals are the individuals we represent 


1 The mereology of multisensory individuals can get tricky. If one takes the sensory individuals of 
auditions to be sounds and of olfaction to be odors, then the multisensory individual would be some 
kind of mereological sum of the sound, the odor and the ordinary object. See O'Callaghan 2015a for a 
good summary of the options here. 
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perceptually (consciously or unconsciously) as having properties by means of 
more than one sense modality. 

The question is this: what happens when we perceive multisensory individ- 
uals? This should be a central question for any account of perception, given 
that most of what we perceive are multisensory individuals. People are multi- 
sensory individuals—we can see them, hear them, smell them, touch them, 
maybe taste them too. And the same goes for most of the objects and events 
around us. 

Note the reference to abilities in the way I characterized multisensory indi- 
viduals: just because we can perceive something with more than one sense 
modality, it doesn’t mean that we do so. Very often we perceive multisensory 
individuals with one sense modality only. And this is where mental imagery 
again plays a crucial role. 

Multimodal mental imagery is mental imagery that is triggered by sensory 
stimulation in another sense modality (see Lacey and Lawson 2013 for a vari- 
ety of examples). If perceptual processing is directly triggered by sensory 
input, we get sensory stimulation-driven perception. If it is triggered— 
indirectly—by sensory stimulation in another sense modality, we get multi- 
modal mental imagery. If it is triggered—indirectly—by something else, we 
get some other kind of (non-multimodal) mental imagery. In short, multi- 
modal mental imagery is mental imagery in one sense modality induced by 
sensory stimulation in another sense modality.’ 

One might wonder: Why posit (unconscious) multimodal mental imagery 
in these cases, rather than just denying that there is any mental imagery at all? 
The answer should be clear in the light of the definition of mental imagery: 
even though you might not be aware of it, your early perceptual processing in 
one sense modality is triggered by sensory stimulation in another sense 


? Much of this chapter is about the intricate connections between different sense modalities. 
Nonetheless, in the definition of multimodal mental imagery, I am relying on the difference between 
perceptual processing in different sense modalities. It is important to emphasize that there is no ten- 
sion between these two claims—in spite of all the intricate links between the perceptual processing in 
different sense modalities, we can nonetheless identify what distinctively visual perceptual processing 
amounts to (and this is consistent, in the context of mental imagery, with more than one sense modal- 
ity contributing to mental imagery in one sense modality; see Hubbard 2013). Multimodality does not 
imply that there are no distinct sense modalities. See Chapter 14 for more on this worry. 

° A brief terminological remark: the reference to multimodality in the label “multimodal mental 
imagery” does not refer to the multimodality of our phenomenology when we have multimodal men- 
tal imagery. What “multimodal” refers to in the name of multimodal mental imagery is the etiology of 
mental imagery: mental imagery is the product of the interaction between (at least) two different 
sense modalities. The phenomenal feel of multimodal mental imagery, if there is one, may itself be 
unimodal, say, purely visual. But it is the outcome of the interaction between vision and another sense 
modality—it is multimodal in this sense. Another term that is used in the literature to refer to this 
phenomenon is crossmodal mental imagery, see Spence and Deroy 2013. 
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modality. And we have strong empirical evidence that this is so (see Calvert 
et al. 1997; Zangaladze et al. 1999; James et al. 2002; Pekkola et al. 2005; 
Ghazanfar and Schroeder 2006; Martuzzi et al. 2007; Hertrich et al. 2011; Kilintari 
et al. 2011; Hirst et al. 2012; Iurilli et al. 2012; Muckli and Petro 2013; Chan 
et al. 2014; Vetter et al. 2014 for findings in various combinations of sense 
modalities). 

Just a couple of quick examples: Priming subjects with auditory stimuli 
enhances visual discrimination (Chen and Spence 201la, 2011b). And if 
blindfolded subjects listen to different (familiar) sounds, their V1 activity is 
different (Vetter et al. 2014; the same is true of blind subjects, see Vetter et al. 
2020). Subjects were blindfolded and they listened to distinctive sounds— 
birds chirping, people chattering, cars driving by. And their primary visual 
cortex was scanned while they were listening to these sounds. The crucial 
result is that these sounds could be distinguished on the basis of the activities 
of the primary visual cortex alone. So each time we hear some kind of sound 
we are familiar with, multimodal visual imagery (in the sense of visual pro- 
cessing triggered by auditory input) gets triggered. More generally, manipula- 
tions of the sensory input in one sense modality systematically influence 
perceptual processing (and often phenomenology) in another sense modality. 

Let’s go back to the noisy coffee machine. When I am looking at my coffee 
machine that makes funny noises, this is an instance of multisensory percep- 
tion—my coffee machine is a multisensory individual. And if I hear the noisy 
coffee machine in the next room, that is without seeing it, then I represent the 
visual parts of this multisensory individual by means of multimodal mental 
imagery. 

Given that most of the individuals we encounter are multisensory individ- 
uals and given that our perceptual access to these multisensory individuals is 
rarely absolute (that is, encompassing all relevant sense modalities), this hap- 
pens very often. Multimodal mental imagery is the norm, not the exception. 

I argued in Chapter 9 that the vast majority of perceptual states would in 
fact be a hybrid of sensory stimulation-driven perception and mental imagery. 
I used considerations about amodal completion in the argument there, but 
note that we can make an even stronger case for this claim if we take the mul- 
timodal nature of perception into consideration. 

Multimodal mental imagery is, in some ways, a generalization of the 
amodal completion case, and the kind of mental imagery that is involved is 
similar to the mental imagery involved in amodal completion. It is involun- 
tary and localizes in one’s egocentric space. It is also normally unconscious, 
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but when it is not, it is sometimes (not always) accompanied by the feeling of 
presence. 

Think of multisensory individuals as mereologically complex individuals. 
They have many parts. Some of these parts are perceived visually, some others 
are perceived auditorily, for example. If we think of multisensory individuals 
this way, it makes multimodal mental imagery very similar to amodal com- 
pletion. Amodal completion is the representation of those parts of a sensory 
individual we get no sensory stimulation from. And multimodal mental 
imagery is the representation of those parts of a multisensory individual we 
get no sensory stimulation from. It’s just that different parts of this multisen- 
sory individual are accessible by different sense modalities (see O'Callaghan 
2015a, 2015b). 

Most of the time, when we form mental imagery of those parts of a multi- 
sensory individual that we are not acquainted with, this mental imagery will 
be unattended and unconscious. But if we are really interested in them, we 
can attend to them. And such attentional shift may even make some part of a 
multisensory individual conscious. Further, while most of the time the prop- 
erties we attribute to those aspects of the multisensory individual that we are 
not acquainted with are very determinable, we can make them more determi- 
nate (again, if we attend to them). Multimodal mental imagery, like mental 
imagery in general, may be attended or unattended, conscious or uncon- 
scious, and determinate or indeterminate (see Chapter 10).* 

Suppose that I am working in my room and I hear footsteps from down- 
stairs (without seeing who is coming upstairs). I represent the complex multi- 
sensory event of someone coming upstairs: I perceive the auditory parts of 
this event and I represent the other (visual, maybe olfactory) parts of this 
event by means of mental imagery. But my visual and olfactory multimodal 


* Sensory individuals are individuals we perceptually attribute properties to. This property attribu- 
tion can be conscious or unconscious. It is important to keep the concept of “sensory individual” apart 
from that of “perceptual object” (see O'Callaghan 2014; Spence and Bayne 2014). Perceptual objects 
are individuals we consciously experience perceptually. There is a debate about whether perceptual 
objects are multimodal (Nudds 2014; Spence and Bayne 2014). It has been suggested that when we 
perceive multisensory events, our conscious perception is not multimodal: it is unimodal and it oscil- 
lates between the, say, auditory and the visual aspects of these events, never consciously perceiving 
both simultaneously (Spence and Bayne 2014). Against this, others argued that when we perceive 
multisensory events, our conscious perception is multimodal: we consciously and simultaneously per- 
ceive both the auditory and the visual aspects of this event (O’Callaghan 2014). Nothing I have said 
here takes sides in this debate: everything I have said is compatible with a multimodal or an oscillation 
view about conscious perception. Regardless of how the different mereological parts of the multisen- 
sory individuals show up in consciousness, our (conscious or unconscious) perception represents 
both simultaneously. And when one of these parts is missing, it continues to do so. 
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mental imagery may not be conscious—if I am not too concerned with who is 
coming upstairs. My olfactory mental imagery of the olfactory aspects of the 
multisensory event whose auditory aspects I am acquainted with is likely to 
be unattended, unconscious and very determinable. But if the only two peo- 
ple who can come upstairs are my stinky friend X or my other friend, Y, who 
uses very nice perfume, and if I really want to know which one it is, I will be 
likely to fill in the olfactory aspects of the multisensory event in a more deter- 
minate way (which can prime me to recognize them by smell more quickly) 
(see Berger and Ehrsson 2013, 2014, 2018 for more on the way mental 
imagery and multimodal integration interacts). 

Here is a nice experimental illustration of this point. The double flash illu- 
sion is one of the most striking crossmodal illusions: you are presented with 
one flash and two beeps simultaneously (Shams et al. 2000). So the sensory 
stimulation in the visual sense modality is one flash. But you experience two 
flashes and already in the primary visual cortex, two flashes are processed 
(Watkins et al. 2006). This means that the double flash illusion is really about 
multimodal mental imagery: in the case of the second flash, we have percep- 
tual processing in the visual sense modality (again, already in V1) that is not 
directly triggered by sensory input in the visual sense modality (but by sen- 
sory stimulation in the auditory sense modality). 

The multimodal mental imagery that is involved in the double flash illusion 
is conscious, involuntary, accompanied by the feeling of presence, and local- 
izes in egocentric space. It is accompanied by the feeling of presence so much 
that we do take ourselves to perceive two flashes, not one. 

Recall that mental imagery in general can be triggered laterally or in a 
top-down manner. Multimodal mental imagery is, in some very real sense, 
triggered laterally: the auditory processing in the early sensory cortices is 
triggered by visual processing in the early sensory cortices. But this leaves 
open the question about whether any top-down influences are involved in 
multimodal mental imagery. 

One example of multimodal mental imagery where no top-down influence 
plays any role comes from the double flash illusion. As we have seen, in this 
case, perceptual processing in the visual sense modality (starting with the 
primary visual cortex) is not directly triggered by sensory input in the visual 
sense modality, because the sensory stimulation in the visual sense modality 
has only one flash, whereas even as early as the primary visual cortex, two 
flashes are processed. 

Again, this seems like multimodal mental imagery without any top-down 
influence. The primary visual cortex is influenced laterally by auditory 
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information (namely, the two beeps), but it is not influenced by any top-down 
information (some recent findings suggest that the picture may be more com- 
plicated as previous exposure to similar stimuli may have an important effect 
on the crossmodal illusion—see, for example, Roseboom et al. 2013). And 
there is plenty of evidence that many other crossmodal effects happen very 
early on in perceptual processing and without any top-down interference 
(Senkowski et al. 2011; De Meo et al. 2015). 

But in many other cases of multimodal mental imagery, top-down influ- 
ences are very important. One widely used and researched example of multi- 
modal mental imagery is seeing someone talking on television with the sound 
muted. The visual perception of the talking head in the visual sense modality 
leads to an auditory mental imagery in the auditory sense modality (for 
example, Calvert et al. 1997; Pekkola et al. 2005; Hertrich et al. 2011; Spence 
and Deroy 2013). 

The auditory mental imagery will very much depend on factors like the lip 
movements of the person on the screen. But not only these. If this person is 
someone you know or have heard speak, your auditory mental imagery will 
be influenced by this information. If it is Barack Obama (someone you have, 
presumably, heard before), you may “hear” him speaking with his distinctive 
tone of voice or intonation, for example (but even if you don't, your auditory 
cortices behave very differently). This demonstrates nicely the importance of 
top-down influences on multimodal mental imagery. 

The fact that we have auditory mental imagery of Obama's voice and not of 
someone else's voice (or no voice at all), is explained by top-down influences 
on auditory mental imagery. When I am listening to Obama's speech with the 
TV muted, my auditory mental imagery is influenced by various past memo- 
ries of hearing Obama speak and my expectation of how his voice would 
sound. Just how far up this top-down influence comes from is a question I 
want to leave open. Wherever it comes from, it is definitely further up from 
the primary auditory cortex, and that is enough for it to count as a top-down 
influence. Multimodal mental imagery—like mental imagery in general—can 
be, but need not be, subject to top-down influences. 


14 


Sense Modalities in Mental Imagery 


Multimodal mental imagery is defined as early cortical processing in one 
sense modality triggered by sensory stimulation in another sense modality. 
But what concept of sense modalities is presupposed in this definition and 
how can we keep apart the different sense modalities? The aim of this chapter 
is to clarify the concept of sense modalities in understanding multimodal 
mental imagery and to talk about the differences between mental imagery in 
different sense modalities as well as the rich interactions between them. 

When I define multimodal mental imagery in terms of early cortical pro- 
cessing in one sense modality triggered by sensory stimulation in another 
sense modality, I do need to rely on keeping apart the two sense modalities 
involved. I have been using phrases like “visual processing triggered by audi- 
tory input” liberally in the previous chapter. But it is important to emphasize 
that this way of thinking about multimodal mental imagery does not presup- 
pose any specific way of individuating sense modalities (see Stokes et al. 2014 
for a good summary on debates concerning the individuation of the sense 
modalities). And the talk of, say, visual perceptual processing is very much 
consistent with the plasticity of the brain. 

There are well-documented cases where the visual areas are recruited for 
other (for example, tactile) tasks (Pascual-Leone and Hamilton 2001; Kupers 
et al. 2011; Kupers and Ptito 2014). Thus, if we (as we should) take the plas- 
ticity of the brain seriously, we should not draw the line between different 
sense modalities in terms of brain regions. It is important to stress that my 
account of mental imagery in general, and multimodal mental imagery in 
particular, does not identify perceptual processing in the different sense 
modalities physiologically, but rather functionally. 

In other words, the difference between visual and auditory processing is 
not a physiological, but a functional difference.’ Visual processing is not iden- 
tified in terms of brain areas, but rather in terms of its function. Just what this 


1 In this sense, my proposal is consistent with the so-called meta-modal brain hypothesis, accord- 
ing to which perceptual processing is, say, visual, not because it is the processing of visual input, but 
because of the nature of this processing, regardless of where the input comes from (Pascual-Leone and 
Hamilton 2001). 
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function would be is something I would like to leave open, but the function 
of, say, visual processing could be identified as something like helping small- 
scale spatial discrimination or transforming input in a way that preserves the 
spatial homomorphism between the input and the perceptual processing. 
Normally, the part of the brain that does this is located at a very specific part 
of the back of the brain in the occipital lobe. But even if, because of the plas- 
ticity of our brain, some other part does this processing, we can identify it— 
functionally, not physiologically—as the locus of visual processing. Similar 
considerations apply to the other sense modalities. 

But the distinction between different sense modalities is important for yet 
another reason for an account of multimodal mental imagery. Mental imagery 
works very differently in different sense modalities. We have seen some pecu- 
liarities of visual mental imagery and its various forms in Chapter 8. But some 
of these forms of mental imagery (like peripheral vision or the filling-in of the 
blind spot) are specific to vision. And mental imagery in other sense modali- 
ties has different peculiarities. 

Take olfaction (Bensafi et al. 2003; Royet et al. 2013; Young 2016, 2020). 
The primary sensory cortex devoted to olfactory processing is the piriform 
cortex. So olfactory mental imagery—early perceptual processing in the 
olfactory sense modality that is not directly triggered by olfactory sensory 
input—typically involves the piriform cortex (Djordjevic et al. 2005; Bensafi 
et al. 2007). 

One of the most interesting findings about olfactory mental imagery is that 
it seems to depend on sniffing (Mainland and Sobel 2006). Sniffing results in 
piriform cortex activation (Sobel et al. 1998; Koritnik et al. 2009). And volun- 
tarily triggered olfactory mental imagery leads to an increased sniffing rate 
(Bensafi et al. 2003; Kleemann et al. 2009). Further, if the (often involuntary) 
sniffing is stopped by some artificial means, the olfactory mental imagery is 
less vivid (Arshamian et al. 2008). 

These findings resemble, at least in a very general structural sense, the find- 
ings about the importance of eye movements for visual mental imagery. As 
we have seen in Chapter 7, if eye movements are artificially suppressed, the 
subject has difficulties conjuring up visual mental imagery. And each time we 
have visual mental imagery, the micromovements of our eyes track the imag- 
ined outlines. So, while we have an important structural similarity between 
vision and olfaction inasmuch as the mental imagery in both sense modalities 
(just like sensory stimulation-driven perception) depends on movements of 
the sense organ, this movement is very different in the two sense modalities 
(sniffing vs. eye movements). 
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Further, just as visual mental imagery can be triggered by auditory sensory 
stimulation and vice versa, reading olfactorily charged words, for example, 
can also trigger olfactory mental imagery (not just activation of the piriform 
cortex, which would be a good enough reason to conclude that there is 
olfactory mental imagery, but often also the conscious experience, see 
Gonzalez et al. 2006). And olfactory mental imagery can also be triggered by 
pictures (of food items; for example, see Gottfried et al. 2002, 2004). There is 
also evidence for the attentional modulation of olfactory mental imagery 
(Zelano et al. 2005, 2011). 

Olfactory mental imagery can change with exposure. Wine experts are bet- 
ter at having wine-related olfactory mental imagery than novices (although 
they are not better than them at having any other forms of mental imagery, 
for example, visual imagery). And wine experts also have stronger and more 
vivid wine-related olfactory mental imagery than olfactory mental imagery 
that is not related to wine (Croijmans et al. 2020). Our ability to form olfac- 
tory mental imagery changes throughout our life. 

An especially exciting and underexplored question concerning olfaction is 
about olfactory amodal completion (Young and Nanay 2022). Amodal com- 
pletion in olfaction can take three different forms. There is spatial 
completion—when the olfactory system fills in sparser parts of the odor 
plumes. There is temporal completion—when the olfactory system anticipates 
the next step in an odor sequence. And there is feature-based completion— 
when the olfactory system fills in a missing feature in a usually co-occurring 
feature-set of odors. The important differences and similarities between visual 
and olfactory mental imagery make it clear how we should not generalize 
from one sense modality to another, but also how multimodal mental imagery 
occurs—in one form or another—in all sense modalities. 

I focused on olfaction because olfactory mental imagery is in some ways the 
least similar to visual mental imagery, but imagery in all sense modalities has 
its peculiarities. Just one example: When you use any handheld tool (like a 
hammer or a tennis racket), your somatosensory cortex, which (to put it some- 
what simplistically) maps touch on your body, adjusts immediately and local- 
izes touch as if it were coming from a hammer-or tennis racket-shaped 
extension of your skin (Miller et al. 2019). There is no sensory input coming 
directly from the tennis racket itself as the tennis racket itself does not have 
tactile receptors. This, according to my account, would count as mental 
imagery: early cortical processing that is not directly triggered by sensory input. 

Being clear about the differences between different sense modalities in 
mental imagery is important for yet another reason. One of the most drawn 
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out debates in philosophy of perception is about the so-called Molyneux’s 
question. Molyneux’s question may have originated from the twelfth-century 
Islamic philosopher, Ibn Tufail, who was the author of the first philosophical 
novel, Hayy Ibn Yaqzan, which was translated to English and published in 
1671 and was very widely read afterwards in England (Russell 1994). It is 
called Molyneux’s question, because of a question the seventeenth-century 
Irish philosopher William Molyneux posed in a letter addressed to John 
Locke in 1688. The question is simple: suppose that a blind subject is familiar 
with two very differently shaped objects by tactile perception. If her vision 
were restored would she be able to tell them apart and identify them visually? 
So our blind subject handles a cube and a sphere and then when her sight is 
restored and she looks at a cube and a sphere, can she identify one as a sphere 
and the other one as a cube? And in the centuries that followed, answering 
Molyneux’s question (and, preferably, giving an original answer) has been a 
challenge for any aspiring philosopher of perception. 

Locke himself answered no (Locke 1690, II, ix), as did George Berkeley 
(Berkeley 1709, p. 41, p. 110; Berkeley 1710, p. 43). Gottfried Leibniz, directly 
contradicting Locke, said yes (Leibniz 1704/1765, II, ix). Thomas Reid 
thought the question was ambiguous (depending on whether it is about two- 
dimensional or three-dimensional shapes, and also depending on the subject’s 
expertise,” we get different answers; see Reid 1764/1997, VI, 3, 7, 11). But the 
discussion of Molyneux’s question did not stop in the eighteenth century (see 
Degenaar 1996). Gareth Evans gave a very original answer defending the 
Leibnizian positive answer, kicking off a new wave of debates (Evans 1985; see 
also Jacomuzzi et al. 2003; Schumacher 2003; Noé 2004; Campbell 2005; 
Levin 2008; Bruno and Mandelbaum 2010). As Matthen and Cohen 2019 
point out, there are by now many, somewhat different, versions of Molyneux’s 
questions and not all of these accounts are answering the same question 
(see also Glenney 2013; Ferretti 2017). 

This was a theoretical question in the seventeenth century and it still was at 
the turn of the century. But it has been suggested that the question can be 
answered, given today’s medical technology, in an empirical manner. And this 
is exactly what was done more than a decade ago with congenitally blind peo- 
ple after their sight was restored (Held et al. 2011). What these findings show 
is that after having their sight restored, these subjects could immediately 


> In fact, there might be a way of interpreting Reid in a way that the dependence on expertise itself 
correlates with the vividness of the subjects mental imagery (see Reid 1764/1997, esp. pp. 117-18 
(6.11)), which would make Reid’s view consistent with mine. But I will not attempt to argue for this 
historical claim here (or anywhere else). 
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match one visual shape with another one (just as they could match one haptic 
shape with another one), but they could not match the shapes across sense 
modalities: so they could not match a visual shape and a haptic shape—which 
would have been the task at stake in the Molyneux debate. They did manage 
to acquire this ability in a couple of days, but not immediately after having 
their sight restored. 

So a tempting resolution of Molyneux’s question would be that the data is 
in and it supports the Locke/Berkeley line of thought. Leibniz and Evans lost 
(see Connolly 2013; Cheng 2015; and Clarke 2016 for some methodological 
reasons why we might want to resist this temptation though). 

I want to question this conclusion and tackle Molyneux’s question on the 
basis of what we know about multimodal mental imagery. My claim is that 
Molyneux’s question doesn’t have a generic answer. While this may be a 
somewhat disappointing take, the reasons for it, concerning individual differ- 
ences between blind subjects, are hopefully less disappointing. 

There are different forms of blindness. Cortically blind people have dam- 
aged visual cortices, so they are unlikely to have any form of visual imagery 
(but see de Gelder et al. 2015). But most blind people’s visual cortices are 
intact. And many of them can visualize, often as vividly as sighted people. 
They can also have visual dreams. They also have crossmodally triggered 
mental imagery (conscious or unconscious; see Vetter et al. 2020). And, cru- 
cially, these blind people’s visual cortex maps spatial locations in a “retino- 
topic” manner—where the word “retinotopic” is between scare quotes because 
these subjects, being blind, have no activation on their retina. Nonetheless, 
their visual cortices represent the space in front of them (and the various— 
mainly auditory—stimuli in this space) in a way that is structured exactly the 
way retinotopic representation of space in sighted subjects is structured 
(Norman and Thaler 2019). In short, many blind subjects have genuinely 
visual mental imagery. Not just some kind of processing in the part of the 
brain where we find visual processing in sighted subjects. They have the kind 
of retinotopic visual processing in their visual cortex that sighted subjects do. 

Things are a bit more complicated when it comes to the mental imagery 
abilities of congenitally blind people, that is, people who are blind from birth 
(see Arditi et al. 1988). Given the brain’s propensity to rewire unused parts of 
the brain in order for them to do something useful, the visual cortex of con- 
genitally blind people is a prime candidate for performing non-visual func- 
tions. And it has indeed been found that the visual cortex of congenitally 
blind people performs a wide variety of functions that are not about the seg- 
mentation of the two-dimensional visual input (Bedny 2017). But this is true 
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of the visual cortex in general, very much including the visual cortex of 
sighted subjects (see Seydell-Greenwald et al. 2020 for an especially impres- 
sive study). Further, the crucial results about the “retinotopic” representation 
of space in front of the subject I talked about in the previous paragraph also 
hold for many congenitally blind subjects: their visual cortices also localize 
(auditory) stimulus in the space in front of them in a way that is structured 
retinotopically (again, in the absence of any retinal input) (Striem-Amit et al. 
2015). So while congenitally blind people rarely have the kind of phenomenal 
feel that accompanies mental imagery in sighted subjects (Cattaneo et al. 
2008; Kupers et al. 2011), given the way their visual cortices behave, we can 
conclude that they can have visual imagery, that is, early visual processing— 
again, not just processing where V1 is found in sighted subjects, but retinoto- 
pically structured visual processing—that is not directly triggered by 
sensory input. 

In short, some blind people (including some congenitally blind people) 
have visual mental imagery, some others dont (Aleman et al. 2001, see also 
Villey 1930 for an early account of this). Those who do would also have multi- 
modal visual mental imagery, that is, visual mental imagery that is triggered 
by sensory stimulation in a different sense modality. So they would have 
visual mental imagery of the sphere when they handle the sphere—an 
instance of multimodal visual mental imagery triggered by touch. And they 
would have visual mental imagery of the cube when they handle the cube. 
Blind people who lack visual mental imagery would not have any of this. 

The question is, which category of blind subjects is Molyneux’s question 
asking about? The cortically blind subjects would have very little chance to 
identify the cube and the sphere, but those blind subjects who have visual 
(and multimodal) mental imagery can use their visual mental imagery to 
identify the cube which they now see by means of sensory stimulation-driven 
perception. So depending on the state and use of the visual cortices of the 
Molyneux subjects, we get very different answers. 

The Held et al. (2011) experiments, therefore, should be considered to be a 
partial answer to Molyneux’s question: for some blind subjects, the answer to 
Molyneux’s question might indeed be no. But this is not the final answer. 
While the original Held et al. (2011) study is not very specific about whether 
the subjects whose sight was restored in this experiment had the ability to 
have visual imagery before the operation, it is clear from a follow-up study 
(which studied the visual imagery of these subjects post-operation), that it 
is unlikely that they had visual imagery before the operation (Gandhi 
et al. 2014). 
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In other words, the reason why the Held et al. (2011) experiment (and its 
follow-up experiments) only gave a partial answer to Molyneux’s question is 
that they are silent about what the answer to Molyneux’s question should be 
in the case of those blind subjects who do have the ability to have visual men- 
tal imagery. And, as we have seen, we have good reason to suppose that as the 
visual mental imagery of these subjects can be triggered by sensory stimula- 
tion in another sense modality (crucially for our purposes, in the tactile sense 
modality), these subjects would have visual mental imagery of spheres and 
cubes long before their sight was restored. And then, after the operation, they 
could and would utilize their visual mental imagery of cubes and spheres in 
visually recognizing cubes and spheres.’ 

It needs to be emphasized that this is an empirical hypothesis that has not 
been tested. The empirical hypothesis is that if the sight of those congenitally 
blind people who have visual mental imagery were restored, they would be 
able to match seen and felt objects. This empirical hypothesis has not been 
confirmed or disconfirmed and my aim is to give theoretical reasons for why 
we should expect this empirical hypothesis to be true. But we will only know 
for sure if we find congenitally blind people who can be demonstrated to have 
visual mental imagery and restore their sight. In other words, we would need 
to conduct a Held et al. (2011) style experiment but with close attention to the 
visual imagery abilities of the experimental subjects. 

In short, Molyneux’s question gets very different answers depending on the 
visual imagery abilities of the blind people in question (see Nanay 2020d for a 
more detailed argument for this claim). “Blind people” is not a monolithic 
category. To abuse Tolstoy’s famous first line of the novel Anna Karenina: all 
vision is alike, but all blind people are blind in their own way (see also Block 
2016 for reusing this line for other purposes in philosophy of perception). 
Lots of things need to come together for someone to perceive visually. 
Consequently, lots of things can go wrong. There could be problems with the 
retina, with the main visual pathway, with the lateral geniculate nucleus, with 
the early cortices, and so on. Any one of these problems would result in blind- 
ness, but very different kinds of blindness (Cattaneo and Vecchi 2011). 

Treating blindness as a monolithic phenomenon would paper over these 
crucial differences. And the Molyneux question does just this: it papers over 
the crucial differences between very different kinds of blindness. When 


* I will come back to the importance of keeping the visual cortices of blind subjects in use in 
Chapter 15. 
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Molyneux’s question was originally posed, this might not have been as obvious as 
it is now (although Diderot's Letter to the Blind paints another story). But now we 
do know this, so there is no reason why we would want to know the answer to a 
question about “the blind” as there is no such thing as “the blind” The historical 
misadventures of trying to answer Molyneux’s question is a beautiful demonstra- 
tion of this. 


15 


Sensory Substitution and Echolocation 


The fact that many blind people have visual imagery is especially important 
(and especially relevant in practical terms) when it comes to the various 
means by which blind people’s navigational abilities can be improved. I will 
talk about two of these in this chapter, sensory substitution and echolocation, 
and argue that both substantially involve the subjects’ early visual cortices 
and, as a result, both count as examples of multimodal mental imagery. The 
same argument also applies to cane use and Braille (see Burton 2003 for a 
summary), but in this chapter, I will focus on the more surprising results con- 
cerning sensory substitution and echolocation. 

Blind subjects can be taught to navigate their environment in some sense 
“visually” by having a camera installed on their body, the images of which are 
fed into some other sense modality of the subject. The camera is recording 
images continuously and these images are transmitted to the subject in real 
time in the tactile sense modality, for example (it can also be done auditorily, 
see Meijer 1992). So the images are imprinted on the subject’s skin with slight 
pricks as soon as they are recorded (see Bach-y-Rita et al. 1969; Bach-y-Rita 
and Kercel 2003). A lot of research has been done about this phenomenon in 
the last four decades (Meijer 1992; Sampaio et al. 2001; Tyler et al. 2003; 
Auvray et al. 2007; Amedi et al. 2007; Ward and Meijer 2010; Deroy and 
Auvray 2012 for summaries; and Chirimuuta and Paterson 2015 for a histori- 
cal overview of the sensory substitution research as well as the chapters in 
Macpherson 2018 for the philosophical import of these findings). 

The surprising results were that the subjects eventually experienced the 
scene in front of them “visually’—they talked about visual occlusion, for 
example, and they were very competent at navigating relatively complex ter- 
rains. They “spontaneously report the external localization of stimuli in that 
sensory information seems to come from in front of the camera, rather than 
from the vibrotactors on their back” (Bach-y-Rita et al. 1969, p. 964). 

Philosophers were quick to jump on these findings for philosophical 
ammunition in the grand debate about how we should individuate the senses 
(see, for example, Morgan 1977; Heil 1983, 2011; Peacocke 1983; Hurley and 
Noé 2003; Gray 2011; Farina 2013; but see also Block 2003). The big question 
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was: is “vision” that is assisted by sensory substitution really vision? Or is it 
tactile perception? Some of the classic ways of individuating the senses (Grice 
1962; Keeley 2002; Nudds 2004, 2011) come apart in this odd case: if we indi- 
viduate the senses according to the sense organ involved, then sensory 
substitution-assisted “vision” would count as tactile perception. If we individ- 
uate the senses according to phenomenology, then it seems to be vision.’ 

In the light of the discussion in Chapter 13, this debate is somewhat mis- 
guided. Sensory substitution-assisted “vision” is neither vision nor tactile 
perception, because it is not perception at all. It is mental imagery— 
multimodal mental imagery. It is visual mental imagery triggered by tactile 
sensory stimulation (see Nanay 2017a for a longer version of this argument). 

If there is activation in the early visual cortices of the sensory substitution 
subjects, then they have multimodal mental imagery: early cortical activation 
in one sense modality (vision) triggered by sensory stimulation in another 
sense modality (touch or audition). 

And, as it turns out, there is indeed activity in the primary visual cortex of 
these subjects that was clearly not triggered by sensory stimulation as the sub- 
jects were blind. They were triggered by sensory stimulation in the tactile 
sense modality (Renier et al. 2005a; Murphy et al. 2016).” So, perception by 
sensory substitution would count as multimodal mental imagery. It is visual 
mental imagery triggered by tactile sensory stimulation—it is multimodal 
mental imagery. 

It might be objected, as in Chapter 14, that it is unclear whether the visual 
cortices of blind people are really visual. But as we have seen, we have evi- 
dence that the “retinotopic” representation of space in the cortices of these 
subjects indicates genuine visual processing (and not some kind of non-visual 
processing in the brain region where, in sighted subjects, the visual cortex is 
located, see esp. Norman and Thaler 2019; Seydell-Greenwald et al. 2020). 

This is not an entirely novel angle in the sensory substitution debate.* 
Renier et al. (2005b) argue that subjects with sensory substitution devices 
“visualize” This way of thinking about sensory substitution points in the 
same direction as the one I outlined here, but talking about visualization is 


1 I say “seems to be” because there are disagreements about the exact phenomenology of these 
experiences. 

Multimodal areas are also involved in later processing of sensory substituted vision (Amedi et al. 
2007). But what matters in determining whether this is an instance of visual multimodal mental 
imagery is whether there is early activation of visual areas. See Chebat et al. (2018). 

* Interestingly, while the academic discussion of sensory substitution does not talk about mental 
imagery, some of the publicity material of the most widely used sensory substitution device, vOICe, 
uses this term surprisingly often. See https://www.seeingwithsound.com/imagery.htm. 
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misleading for a number of reasons (see also Martin and Le Corres 2015 
detailed criticism of Renier et al. 2005b). 

First, visualizing is both a voluntary and an intended act and it seems that 
sensory substitution-assisted “seeing” is neither. Second, visualizing is some- 
thing that happens in a necessarily top-down manner, whereas sensory 
substitution-assisted “seeing” is only top-down inasmuch as normal vision is. 
Finally, the ultimate conclusion of Renier et al. (2005b) is that sensory 
substitution-assisted “seeing” is in fact seeing. And they use this claim about 
visualization as a premise for establishing this conclusion (see, again, Martin 
and Le Corres 2015 criticism). Talking about multimodal mental imagery, 
rather than “visualization,” in these cases fends off all of these worries. 

Further, the empirical evidence that Renier et al. (2005b) use as support for 
their “visualization” account also supports my multimodal mental imagery 
account: subjects with sensory substitution devices undergo the Ponzo illu- 
sion (see Figure 11): an illusion that is widely held to be a visual illusion (see 
Nanay 2009b). 

If sensory substitution subjects have real-time multimodal mental imagery, 
this is exactly what we should expect. Their visual perceptual processing is 
triggered by tactile sensory stimulation. But the perceptual processing hap- 
pens in the visual sense modality. Thus, we should expect the usual oddities of 
this visual perceptual processing, like size constancy illusions, to be present— 
and they are. 

Why is it tempting to think then that subjects who are assisted by sensory 
substitution devices do in fact perceive? One reason may be that the subjects’ 
perceptual processes are involuntary. But we have seen that mental imagery 
may be voluntary or involuntary. Another reason may be that the subjects 


Figure 11 Ponzo illusion 


SENSORY SUBSTITUTION AND ECHOLOCATION lll 


localize the “visual” scene they navigate in their egocentric space—but, again, 
as we have seen, mental imagery may or may not localize in one’s egocentric 
space. Also, in the reports of these subjects it is not entirely clear whether they 
have any feeling of presence of the visual scene in front of them. But even if 
they do, this would be consistent with the claim that they have multimodal 
mental imagery as mental imagery may or may not be accompanied by the 
feeling of presence. 

An additional reason why one may be tempted to think that whatever sen- 
sory substitution can give us must be perception is that it helps us navigate in 
the world and it is clearly causally influenced, in real time, tracking the change 
of the visual features of the world around us as they change. How can it possi- 
bly be mental imagery then? 

It should be clear by now that some ways of exercising mental imagery do 
track the features of our surroundings in real time and in a causally more- or 
less-responsive manner (see also Chapter 24 on just how causally responsive 
these processes are). Amodal completion does, for example. If the cat moves 
behind the picket fence, my mental imagery that is responsible for the amodal 
completion changes accordingly. And multimodal mental imagery also tracks 
the features of our surroundings in real time in a causally (more- or less-) 
responsive manner: if the noises my loud coffee machine in the next room 
makes are changing, my multimodal mental imagery of its visual parts also 
changes. There is nothing about the definition of mental imagery that would 
exclude the possibility of tracking the changing features of our environment 
in real time. 

A final reason for puzzlement about my claim that these blind subjects 
have multimodal mental imagery would come from the seemingly obvious 
assumption that, given that blind subjects cant see, they couldn't have visual 
imagery either. As we have seen, this is just factually incorrect. It has been 
known for a long time that blind people also have visual imagery, and some- 
times even very salient visual imagery (see Chapter 14 for references). In sen- 
sory substitution, this visual mental imagery is triggered by tactile input. 

But then there is nothing mysterious about sensory substitution—there are 
no quick and easy philosophical lessons about the individuation of the senses 
involved either.* Sensory substitution involves perceptual processing and very 


* There are a lot of exciting empirical questions about sensory substitution, of course; for example, 
what kind of learning mechanism is responsible for the formation of multimodal mental imagery in 
the users of sensory substitution devices. The relation to another form of acquired, and somewhat 
odd, form of multimodal mental imagery, namely, synesthesia is another important new research 
direction, see Chapter 16. 
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clearly visual perceptual processing—as the activity in the primary visual cortex 
shows (and this coincides with the phenomenology of the subjects). And 
this visual perceptual processing is induced by tactile sensory stimulation— 
slight pricks on the subject’s skin. As clear-cut a case of multimodal mental 
imagery as it gets. If philosophers want some empirical findings that would 
help them in the debate about the individuation of the senses, they need to 
look elsewhere. 

Another important phenomenon by means of which visual mental 
imagery can play a crucial role in visually impaired people's lives is echolo- 
cation. Echolocation is a form of perception that bats, dolphins, and some 
species of whales are known to use. It consists of emitting sounds and given 
the different rates in which the thus emitted sound waves return to their 
ears, the animal’s brain can gain information about the outline of the fea- 
tures around it. 

Echolocation is a well-studied and fairly well-understood phenomenon in 
bats, dolphins, and some species of whales. But humans can also echolocate, 
and blind people can as well, some remarkably successfully. The general tech- 
nique is to emit hardly audible clicks and it is the varying rates of the echo of 
these clicks that provides information about the features around the subject 
(Dodsworth et al. 2020). 

Echolocation used to be little more than an urban myth, fueled partly by 
viral videos of blindfolded people cycling or skateboarding while using only 
their echolocation to navigate their environment, but very few blind people 
used this mode of perception to get around. This has changed recently as we 
know more about the mechanism and especially the neuroscience of echolo- 
cation (Wallmeier et al. 2013; Kolarik et al. 2014). Further, some non-profit 
organizations have developed echolocation techniques specifically for blind 
people, which included methods (for example, alternating louder and quieter 
clicks) for gaining more complex spatial information (for example, about the 
scene behind the echolocating subject and about the relative distance between 
two distal objects, one behind the another). 

From our point of view, the crucial findings about echolocation concern 
the link between echolocation abilities and visual mental imagery. And here 
a number of different findings point in the direction that echolocation relies 
on visual mental imagery. First, and most importantly, during echolocation 
(both in sighted and in blind people), the early visual cortices are active 
(Thaler et al. 2011, 2014b, Fiehler et al. 2015; Flanagin et al. 2017). Second, 
people who report more vivid visual mental imagery echolocate better (Thaler 
et al. 2014a). Third, in sighted subjects, visual stimulation interferes with the 
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ability to echolocate (given that it puts strains on the early visual cortices) 
(Thaler and Foresteire 2017). 

All these results suggest that echolocation is, in fact, a form of multimodal 
mental imagery. It is early visual processing (in the visual cortices) that is not 
directly triggered by sensory input in the visual sense modality (given that 
subjects receive no visual sensory stimulation at all). It is early visual process- 
ing that is triggered by auditory input (by hearing the clicks emitted earlier). 

This picture is consistent with research on the navigating and localizing abili- 
ties of blind subjects. When blind subjects navigate their environment or when 
they localize auditory or other stimuli, their visual cortices are active (Weeks 
et al. 2000). This is also true of congenitally blind subjects (Kupers et al. 2010). 

As we have seen in Chapter 14, some blind subjects (even some congeni- 
tally blind subjects) are capable of having visual mental imagery. And this 
ability heavily depends on the state of the visual cortices of these blind sub- 
jects. If, as I suggested, echolocation is a form of multimodal mental imagery, 
then the ability to echolocate also depends on the state of the visual cortices of 
these blind subjects. 

And at this point, the theoretical claims of this chapter (namely, that echo- 
location as well as sensory substitution-assisted “vision” count as multimodal 
mental imagery) become much more than a categorization issue or a way of 
merely showing how far the concept of multimodal mental imagery reaches. 

The interpersonal variations in these two techniques are huge. Some blind 
subjects pick up these techniques very quickly and use them efficiently, while 
others struggle even with the most rudimentary steps. Knowing that both 
sensory substitution-assisted “vision” and echolocation are forms of multi- 
modal mental imagery could help us to develop techniques for training blind 
people's visual cortices, which allow for more efficient spatial perception and 
navigation. 

Given the well-demonstrated plasticity of the brain, if a brain region is not 
used regularly, it is reallocated to do something else. More specifically, if blind 
subjects have unaffected visual cortices but do not use them, the visual corti- 
ces get reallocated (to, for example, auditory or olfactory processing). If they 
use their visual cortex, it works well, if they dont, it will eventually stop pro- 
cessing visual information, thereby making it impossible for the subject to 
have visual mental imagery (and, as a result, also making it impossible for 
them to use their mental imagery for navigational techniques like echoloca- 
tion or sensory substitution-assisted “vision.”) 

In short, both of the techniques that help blind people navigate their 
environment, namely, echolocation and sensory substitution-assisted “vision,” 
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rely heavily on the functioning of the early visual cortices. The key insight is 
that blind subjects can navigate their environment better if their visual corti- 
ces are in good condition. But how to achieve that? We can keep the primary 
visual cortices of blind subjects in shape if we have them use their visual men- 
tal imagery. Active reliance on mental imagery prevents the early visual corti- 
ces of blind subjects from being reallocated to other brain functions and 
thereby allows them to make full use of navigation techniques like sensory 
substitution and echolocation. 


16 


Synesthesia 


Some synesthetes hear a musical note and experience it as having a specific 
color (Ward et al. 2006). Some others experience a specific color each time 
they see a specific black numeral or letter printed on white background (Sagiv 
et al. 2006; Tang et al. 2008; Jonas et al. 2011). Synesthesia comes in various 
different forms: lexical-gustatory synesthesia (strong taste experiences when 
looking at letters; see Ward and Simner 2003; Jones et al. 2011), colored touch 
synesthesia (color experiences when touching different things; see Ludwig 
and Simner 2013), spatial time units synesthesia (spatial experience when 
thinking about time units like the days of the week or the months of the year; 
see Smilek et al. 2007; Brang et al. 2011; Jarick et al. 2011). The list could go on. 

Given the diversity of phenomena referred to as synesthesia, there are some 
definitional issues here. Synesthesia has been defined as “stimulation of one 
sensory domain leading to a perception in another sensory domain” (Harrison 
and Baron-Cohen 1997), where the “stimulation in one sensory domain” is 
usually referred to as the “inducer” and the “perception in another sensory 
domain” is referred to as “concurrent.” Others define synesthesia as “the elici- 
tation of perceptual experiences in the absence of the normal sensory stimu- 
lation” (Ward and Mattingley 2006) or “stimulation in one sensory or 
cognitive stream [that] leads to associated experiences in a second unstimu- 
lated stream” (Simner 2012). These definitions pick out slightly different sets 
of phenomena, but from the point of view of this book, it is striking how sim- 
ilar these definitions are to the definition of mental imagery (especially the 
Ward and Mattingley 2006 definition) or multimodal mental imagery (the 
other two definitions). 

And, unsurprisingly, it has been repeatedly suggested that synesthesia is 
intricately linked to unusual ways of exercising one’s mental imagery, although 
it is not always entirely clear what the exact connection is. The aim of this 
chapter is to show that all forms of synesthesia are forms of (often very 
different kinds of) mental imagery and, further, taking synesthesia to be a 
form of mental imagery is not just the mere relabeling of the phenomenon, 
but it has important explanatory advantages, especially when it comes to 
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understanding synesthetic experiences that are not triggered by sensory stim- 
ulation (see Nanay 2021c for a longer version of this argument). 

Synesthesia research makes a distinction between two kinds of synesthetes: 
associators and projectors (Dixon et al. 2004; see also Ward et al. 2007 for 
finer distinctions between surface projectors, space projectors, see-associators, 
and know-associators; and see Edquist et al. 2006 for some further wrinkles). 
When associators see letters printed in black, they associate colors, but they 
do not experience the colors of these letters “out there” in their egocentric 
space. Nor do they experience the letters as having this specific color. 
Projectors, in contrast, do seem to see colors located where (or sometimes 
close to where) the black letters are located. While the experience of associa- 
tors, but not of projectors, is often compared to the experience of mental 
imagery, I will argue that all instances of synesthesia in fact count as a form of 
(multimodal) mental imagery. 

Synesthesia involves activation of the early cortical areas of the synestheti- 
cally activated “sensory streams” (to use the terminology of Simner 2012). So 
if synesthetes have a color experience when hearing a certain pitch, there will 
be perceptual processing in their visual sense modality (Barnett et al. 2008). 
Crucially from our point of view, this perceptual processing happens very 
early on, in most cases in the primary or secondary visual or auditory cortex 
(see, for example, Nunn et al. 2002; Hubbard et al. 2005; Jones et al. 2011). As 
this early perceptual processing is not directly triggered by sensory input, this 
is an instance of mental imagery. 

Nonetheless, not everyone agrees that synesthesia is a form of mental 
imagery. The synesthetic experiences of projectors are routinely characterized 
as different from mental imagery. For example, some (for example, Deroy and 
Spence 2013) claim that the synesthetic experience of projectors is not mental 
imagery on the basis of the introspective reports of projectors as they say that 
visualizing feels different from synesthetic experience. Others (for example, 
Craver-Lemley and Reeves 2013) take synesthesia to be different from mental 
imagery because they take mental imagery to be necessarily voluntary. We 
have seen that mental imagery can be involuntary and that different forms of 
mental imagery can “feel” very different. In other words, these accounts of 
synesthesia are very much consistent with mine. I will argue below that con- 
sidering synesthesia to be a form of mental imagery is not a merely verbal 
move but it has important explanatory consequences. 

More generally, there have been intense debates about just what kind of 
experience synesthetic experience is. Is it a form of perceptual experience 
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(Cohen 2017; Matthen 2017)? Is it a form of hallucination (Fish 2010)? Or is 
it some kind of higher-level, cognitive/linguistic experience (Simner 2007)? 
The problem is that synesthesia doesn't really seem to fit squarely into any of 
these categories. 

The default position about synesthetic experiences is that they are percep- 
tual experiences—maybe somewhat unusual perceptual experiences (see the 
definitions above from Harrison and Baron-Cohen 1997 and Ward and 
Mattingley 2006, which explicitly talk about synesthetic experiences as per- 
ceptual experiences; and see also Cohen 2017 and Matthen 2017 for summa- 
ries). How is the account I am defending here different from this perceptual 
view? In some ways, the difference is merely terminological, inasmuch as 
mental imagery is explicitly defined as perceptual processing that is not 
directly triggered by sensory input. In other words, if synesthesia is a form of 
mental imagery, it is thereby a form of perceptual processing (one that is not 
directly triggered by sensory input). Hence, the mental imagery view would 
be consistent with at least some proposals, according to which synesthetic 
experience is perceptual experience. It would be consistent with Matthen’s 
view, for example, according to which perceptual experience is the “accurate 
imagistic representation of some occurrence in the world that the subject 
understands as such” (Matthen 2017, p. 166). 

So everybody agrees that synesthetic experience is brought about by per- 
ceptual processing. But there is a major distinction between perceptual pro- 
cessing that is directly triggered by sensory input and perceptual processing 
that is not directly triggered by sensory input. The former is “sensory 
stimulation-driven perception” and the latter is “mental imagery? As percep- 
tion per se (as contrasted with perceptual processing) has been widely taken 
to entail sensory stimulation-driven perceptual processing (after all, it is the 
causal link via sensory stimulation that ensures the causal connection to the 
world, which is an essential feature of perception), taking synesthesia to be a 
result of not just perceptual processing, but sensory stimulation-driven per- 
ceptual processing has been the mainstream. In contrast, I will argue that 
synesthetic experience is not sensory stimulation-driven perception, but 
mental imagery. I will argue that, by pinpointing that synesthesia is a very 
specific kind of perceptual process, namely one that is not directly triggered 
by sensory input, my account provides an explanatorily unified account of 
synesthesia, which explains the experiences of both projectors and associators 
as well as less-central cases of synesthesia (where the inducer is not sensory 
stimulation-driven) as instances of mental imagery. 
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Here are some further reasons to think that synesthesia is a form of mental 
imagery. Synesthetes across the board (both associators and projectors) have 
more vivid mental imagery than non-synesthetes (Barnett and Newell 2008; 
Eagleman 2009; Price 2009a, 2009b; Meier and Rothen 2013; Amsel et al. 
2017; but see also Grossenbacher and Lovelace 2001; Simner 2013 for some 
wrinkles and exceptions; Chiou et al. 2018 for discussion). And this differ- 
ence is modality specific—so lexical-gustatory synesthesia subjects have more 
vivid gustatory mental imagery, but not necessarily more vivid mental 
imagery in the, say, auditory sense modality (Spiller et al. 2015). Further, syn- 
esthesia is very rare among aphantasia subjects (who have no, or hardly any, 
conscious mental imagery) and relatively frequent among hyperphantasia 
subjects (who have very vivid mental imagery) (Zeman et al. 2015). 

Some instances of synesthesia are multimodal—for example, the pitch and 
color synesthesia I started the chapter with. Some other instances of synesthe- 
sia are unimodal—for example, the grapheme-color synesthesia (of having 
colored mental imagery of numerals or letters), which seems to be the most 
widespread form of this condition. 

Pll start with multimodal cases. Hearing a certain pitch and having visual 
synesthetic experience of a certain color is a clear case of multimodal mental 
imagery: the perceptual processing in the visual sense modality is triggered 
by sensory stimulation in the auditory sense modality. And seeing a letter and 
having the gustatory synesthetic experience of a flavor is also a clear case of 
multimodal mental imagery (where the perceptual processing in the gusta- 
tory sense modality is triggered by the sensory stimulation in the visual sense 
modality). 

The question is, then, how this differs from other cases of multimodal men- 
tal imagery (like the example of watching Obama’s speech on TV muted). The 
difference is that in non-synesthetic cases of multimodal mental imagery, the 
crossmodal activation is explained by previous exposure. You “hear” Obama's 
distinctive tone of voice when you watch his speech muted because you have 
on previous occasions heard him and seen him (on TV, presumably) at the 
same time. So when you now only have access to the visual part of this famil- 
iar multisensory event, you fill in the familiar auditory part of it. 

In the case of synesthesia, the crossmodal activation is not explained by previ- 
ous exposure. When you see the color purple each time you hear the note of high 
C, this is not explained by your previous exposure to purple high Cs in the past. 
Purple high Cs are not familiar multisensory events that you have encountered 
many times in the past. But that is the only difference between the synesthetic 
and the non-synesthetic forms of multimodal mental imagery. 
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And this is true not only of associators (who often report something like 
involuntary visualizing experiences) but also of projectors (who don't). The 
self-report of many projectors indicates that they take themselves to literally 
see the color of musical notes. Nonetheless, given that the visual perceptual 
processing of the color is not directly triggered by visual sensory input (but 
rather by auditory sensory stimulation), this counts as mental imagery, not 
perception. The fact that synesthesia subjects can mistake one for the other 
indicates that the mental imagery involved in synesthesia comes with the feel- 
ing of presence (like many forms of mental imagery, see Chapter 3). 

So the difference between projectors and associators is merely a familiar 
difference between different forms of mental imagery—for example, whether 
it localizes its object in one’s egocentric space or not. As we have seen, this is 
an important distinction between different instances of mental imagery and 
this distinction also applies within the domain of synesthetic experiences, 
where one standard way of describing the difference between projectors and 
associators is that the former’s experiences locate the concurrent in the sub- 
ject’s egocentric space, whereas the latter doesn’t (Eagleman et al. 2007). 
Another influential way of keeping the experiences of projectors and associa- 
tors apart is to ask whether these experiences are accompanied by the feeling 
of presence or not (on the role of the feeling of presence in various forms of 
synesthesia, see van Leeuwen et al. 2011; Seth 2014). Some instances of men- 
tal imagery are accompanied by the feeling of presence, whereas others are 
not. Ditto for synesthetic experiences, where this distinction may mark the 
difference between projectors and associators. In fact, the experience of pro- 
jectors is, in some ways, more similar to other ways of exercising multimodal 
mental imagery (like the Obama speech case) with regards to the feeling of 
presence. 

Without taking sides in the complex debates about the phenomenology of 
projectors and associators (and, again, acknowledging that neither of these 
are monolithic categories; see Ward et al. 2007 for finer distinctions between 
surface projectors, space projectors, see-associators and know-associators, 
etc.), the more general point here is that standard distinctions between differ- 
ent forms of mental imagery can help us understand the difference between 
projectors and associators. 

This explanation puts synesthesia on a continuum with other forms of mul- 
timodal mental imagery, ones we experience all the time. And this way of 
thinking about synesthesia is consistent with a recent set of findings, which 
shows that synesthesia can be artificially induced in about half of non- 
synesthetes with only five minutes of sensory deprivation (Nair and Brang 
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2019). When people who have never experienced synesthesia before are cut 
off from any kind of sensory stimulation for only five minutes, the result is 
that, coming out of sensory deprivation, more than half of them experience 
some form of synesthesia (see also Gatzia and Brogaard 2016 for other exam- 
ples of artificially induced synesthesia). 

If we consider synesthesia to be a form of mental imagery, these findings 
should not come as a surprise. We know that sensory deprivation induces 
perceptual processes that are not directly triggered by sensory input, because 
the subjects get no sensory input whatsoever and because the perceptual sys- 
tem keeps on functioning even in the absence of any stimulation (see Berkes 
et al. 2011). And these perceptual processes that are not directly triggered by 
sensory input—that is, this mental imagery—explain why subjects subse- 
quently (that is, after getting out of sensory deprivation) tend to have synes- 
thetic experiences—that is, mental imagery. If we take synesthetic experiences 
to be plain stimulus-driven perceptual experiences, no such explanation is 
available. In other words, taking synesthesia to be a form of mental imagery 
has some immediate explanatory benefits. 

This explanatory scheme is only applicable to multimodal cases of synes- 
thesia. But how can we then explain the more widespread unimodal cases of 
synesthesia, like the most widespread form, grapheme-color synesthesia? 

A straightforward way of extending this account of multimodal synesthesia 
is to say that, just as multimodal cases of synesthesia happen when perceived 
(or perceptually processed) properties across sense modalities are bound to a 
multisensory individual in unusual ways, unimodal cases of synesthesia hap- 
pen when perceived (or perceptually processed) properties in one sense 
modality are bound to a unimodal sensory individual in unusual ways. So our 
perceptual system binds shape, size, and color properties to the same uni- 
modal, say, visual, sensory individual. And the reason for this is that most 
objects we see tend to have shape, size, and color. When we see a banana, for 
example, our perceptual system tends to attribute properties of all of these 
three kinds to it (that is, shape, size, and color). 

But the perceptual system of some people binds color properties to uni- 
modal sensory individuals in a way that does not correspond to past exposure 
to unimodal sensory individuals of this kind. Bananas tend to be yellow, so 
having mental imagery of yellow when presented with a grayscale picture of a 
banana is something we should expect as long as we have been exposed to 
yellow bananas in the past (a topic I will come back to in Chapter 18). But the 
grapheme W does not tend to have a specific color (not even black) and, cru- 
cially, our past exposure to the grapheme W is not systematically also an 
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exposure to, say, the color purple. So, given the lack of the past exposure to 
purple Ws, it is surprising that our visual system would complete Ws with the 
mental imagery of the color purple. 

So, just as we complete multisensory individuals, we also complete uni- 
modal sensory individuals. And just as the completed multisensory individu- 
als can be individuals we do not normally encounter (colors with a certain 
specific pitch), the completed unimodal sensory individuals can be individu- 
als we do not normally encounter (graphemes with a certain specific color). 

I should acknowledge that this is not intended to be a full explanation of all 
aspects of synesthesia. I haven't said anything about what causes some people 
and not others to bind properties to these very unusual (multi)sensory indi- 
viduals. But clarifying how exactly the mental states of synesthesia subjects 
differ from the mental states of other subjects (that is, in the (multi)sensory 
individuals that they bind properties perceptually to) should be an important 
step towards such a full explanation. 

It has been suggested that sensory substitution is a form of synesthesia 
(Proulx and Stoerig 2006; Ward and Meijer 2010; Ward and Wright 2012; 
Ward 2013; but see Farina 2013 for criticism). We can make sense of this sug- 
gestion without being forced to take synesthetic experiences to be similar to 
the experience of sensory substituted vision (which seem very different 
indeed) inasmuch as, in my framework, both count as (multimodal) mental 
imagery: both are explained by early cortical perceptual processing in one 
sense modality that is triggered by sensory stimulation in another sense 
modality. But the resulting experiences are very different (as multimodal 
mental imagery may manifest in very different experiences). 

I said earlier in this chapter that taking synesthesia to be a form of (multi- 
modal) mental imagery is not a merely verbal move. It is not just relabeling a 
familiar phenomenon. Taking synesthesia to be a form of mental imagery can 
help us understand how synesthetic experience can be triggered in various 
non-sensory ways. 

In the cases of synesthesia I have discussed so far, the synesthetic experi- 
ence in a specific sense modality is triggered by sensory stimulation (in the 
multimodal case, by sensory stimulation in a different sense modality). But as 
it turns out, synesthetic experience can be induced without any sensory stim- 
ulation. And, crucially, synesthetic experience in one sense modality can be 
induced by mental imagery in another sense modality (by sensorily imagin- 
ing something; for example, see Spiller and Jansari 2008; Spiller et al. 2015). 

In other words, it is not only, say, auditory sensory stimulation that can 
lead to visual synesthetic experience. Auditory mental imagery can also lead 
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to visual synesthetic experience. In other words, early cortical activation in 
one “sensory stream” can trigger synesthetic experiences in a different “sen- 
sory stream,’ regardless of whether this early cortical activation is triggered 
by straightforward perceptual input or by perceptually imagining something. 

Another example of multimodal mental imagery serving as the inducer 
of synesthetic experiences in the absence of direct sensory stimulation 
comes from grapheme-color synesthetes who can also have vivid color 
experiences, even when they are touching the graphemes (and don’t see 
them) (Newell 2013). This is a puzzling piece of finding on the face of it, but 
can be explained in a straightforward manner in the present framework: 
this is yet another instance of synesthetic experience in one sensory stream 
being triggered by mental imagery in another sensory stream (where this 
mental imagery is crossmodally triggered by tactile stimulus). The tactile 
stimulus triggers visual mental imagery of the grapheme and then the visual 
mental imagery of the grapheme induces the synesthetic experience of 
color. If we take the concurrent to be mental imagery, this diverse set of 
synesthetic experiences can all be explained in terms of an early cortical to 
early cortical influence. Even more importantly, if we consider synesthesia 
to be sensory stimulation-driven perception, these well-documented forms 
of synesthesia will not count as synesthesia at all. Taking synesthesia to be 
mental imagery allows us to explain important, but less-central cases of 
synesthesia as synesthesia. 

Another explanatory perk of taking this route comes from some seemingly 
odd cases of synesthetic experiences, where the trigger is neither sensory 
stimulation-driven perception nor perceptual imagining, but rather motoric 
imagining. The most famous example is swimming-style synesthesia: strong 
color experiences when seeing, thinking about, or imagining a swimming 
style—breaststroke, crawl, butterfly, etc. (Nikoli¢ et al. 2011; Mroczko- 
Wasowicz and Werning 2012; Rothen et al. 2013). 

In the case of swimming-style synesthesia, synesthetic experiences can be 
triggered in the absence of any kind of perceptual stimulus (it can happen 
even when your eyes are closed). But then what triggers these experiences? 
There seem to be two options, both of which would be compatible with the 
framework according to which synesthesia is a form of mental imagery 
(again, note that neither of these options are open to those who take synes- 
thesia to be sensory stimulation-driven perception). 

The first option is that when you think about, say, breaststroke, you invol- 
untarily visualize a person swimming breaststroke and it is this involuntary 
visual mental imagery of somebody doing breaststroke that triggers the 


SYNESTHESIA 123 


synesthetic experience (like in the sensory imagination cases; see Spiller and 
Jansari 2008; Spiller et al. 2015). 

The other option is that when you think about breaststroke, you have motor 
imagery of swimming in breaststroke—you imagine doing the breaststroke. 
This second option seems to be closer to the subjects’ descriptions of their 
experience. And in this case, it is motor imagery (imagining doing some- 
thing) that triggers mental imagery in a different “sensory stream” (in this 
case, color mental imagery). So this is, just like the previous example, an 
imagery to imagery influence—but the former imagery is motor imagery (see 
Chapter 27 for more on motor imagery). 

In short, if we accept the proposal that synesthesia is a form of mental 
imagery, we get a form of explanatory unification with regards to the diverse 
triggers of synesthetic experiences inasmuch as all of them can be explained 
in terms of early cortical representation — early cortical representation inter- 
actions. I argued that the concurrent is mental imagery. And this explains 
why the inducer is often also mental (or motor) imagery.’ 


1! There are additional explanatory advantages; for example, taking synesthesia to be a form of 
(multimodal) mental imagery can explain the complex interactions between synesthetic experiences 
and some crossmodal illusions, like the double-flash illusion (Shams et al. 2000; Watkins et al. 2006; 
Roseboom et al. 2013). While synesthesia subjects are more likely to be fooled by crossmodal illusions 
like the double-flash illusion (which itself relies on multimodal mental imagery), there are some 
important exceptions and wrinkles (Innes-Brown et al. 2011; Brang et al. 2012; Neufeld et al. 2012; 
Newell and Mitchell 2016). Given that, according to my account, both synesthesia and the double- 
flash illusion count as (different) forms of multimodal mental imagery, this interaction (but also the 
exceptions) can be explained in a straightforward manner, but given the complexity of the issue, I will 
not do so here. 


17 


Pain 


The standard account of pain perception is that it is caused by some form of 
tissue damage. The tissue is damaged, the pain sensors, commonly referred 
to as nociceptors, get activated and send a signal to the central nervous sys- 
tem, and the processing of this pain signal gives rise to painful phenome- 
nology. You step on my toe, the nociceptors in my toe send a signal to my 
brain and the processing of this signal gives rise to the feeling of the pain 
in my toe. 

We have seen in Chapter 9 that what we pre-theoretically take to be percep- 
tion is really a mixture of sensory stimulation-driven perception and mental 
imagery. My aim is to argue that the same general picture is also applicable to 
pain perception. What we pre-theoretically take to be pain perception is also 
a mixture of sensory stimulation-driven (that is, nociception-driven) pain 
perception and pain imagery (that is, pain processing that is not directly trig- 
gered by nociceptors). 

What would count as pain imagery, that is, mental imagery in the context 
of pain perception? As we have seen, mental imagery is perceptual processing 
that is not directly triggered by sensory input. So when it comes to pain it 
would be cortical pain processing that is not directly triggered by nociceptors 
(that is, by the sensory stimulation of the pain receptors). 

In sensory stimulation-driven vision, the light hits our retina and this 
directly triggers early cortical visual processing. If there is perceptual process- 
ing in these regions that is not directly triggered by retinal input, we have to 
refer to it as visual mental imagery. Similarly, in sensory stimulation-driven 
pain perception, the nociceptors are activated and this directly triggers pain 
processing in clearly delineated cortical regions, especially the primary and 
secondary somatosensory cortices (S1/S2). The somatosensory cortices proc- 
ess tactile and nociceptive information and they do so very differently. So just 
by looking at the activations of the somatosensory cortices, we can tell 
whether the processed stimulus was a painful or merely a tactile one (Ploner 
et al. 2000). Crucially, if there is pain processing in these regions that is not 
triggered by nociceptors, we have to talk about a specific form of mental 
imagery: pain imagery. 
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Some instances of pain imagery are accompanied by the characteristic 
(painful) phenomenal character. Some other instances (for example, most 
cases of imagining that one is in pain; see Derbyshire et al. 2004; Hoenen et al. 
2015) are not. We can have activation in the primary and secondary somato- 
sensory cortices that is very similar to nociceptor-driven pain processing that 
nonetheless does not lead to any painful phenomenology. Similarly, in the 
visual case, one can have activation in the early visual cortices without having 
the phenomenal feel of visualizing anything. But even if pain imagery is not 
accompanied by any painful phenomenology, it can still influence the phe- 
nomenal character of simultaneous nociceptor-driven pain processing. 

My claim is that pain perception is a hybrid of nociceptor-driven pain per- 
ception and pain imagery. Hence, pain imagery is of crucial importance in 
understanding pain perception. I will start with examples of pain perception 
where all we have is pain imagery. These are on one end of the spectrum, 
where nociception plays no role. 

Phantom limb pain (the pain one feels in amputated limbs) has been at the 
center of philosophical discussions of pain, partly because it seems to demon- 
strate that we can have pain even if the intentional object of this pain does not 
exist. But if we consider the phenomenon of phantom limb pain in the gen- 
eral theoretical framework I outlined above, it will very clearly count as pain 
imagery: the very well-documented activation of somatosensory cortices 
is blatantly not triggered by nociceptors (as the relevant nociceptors don't 
even exist). 

It is crucial to emphasize that this does not mean that phantom limb pain is 
any less real than nociceptor-driven pain: I am not doubting the reality of 
phantom limb pain at all when I am describing it as mental imagery.’ We have 
seen that mental imagery may or may not be conscious. And if conscious, it 
may or may not be accompanied by the feeling of presence. In the case of 
phantom limb pain, pain imagery is clearly conscious and it is also clearly 
accompanied by the feeling of presence. 

And phantom limb pain is not the only instance of pain that is fully consti- 
tuted by pain imagery. Another example is the thermal grill illusion. This is 


1 Note that I am not committed to the claim that phantom limb pain is hallucinated pain. Ned 
Block argued at length that there is no such thing as pain hallucination (Block 2006). And while many 
forms of hallucination will count as a form of mental imagery, according to the concept of mental 
imagery I have been using (as perceptual processing that is not directly triggered by sensory input, see 
Chapter 3), the form of mental imagery that is involved in pain perception in general, and in phantom 
limb pain in particular, may be very different from the form of mental imagery that one may want to 
label “hallucination”? See Nanay (2016a) for a discussion of what form of mental imagery halluci- 
nation is. 


126 MENTAL IMAGERY: PHILOSOPHY, PSYCHOLOGY, NEUROSCIENCE 


one of the oldest perceptual illusions involving pain: if the subject touches 
three bars at the same time, the middle one cold and the others warm, she 
experiences burning pain where the middle bar touches her skin (Craig and 
Bushnell 1994). This is a clear example of pain imagery in my framework as 
the activation in S1/S2 is not triggered by nociceptors (in fact, nociceptors are 
not involved anywhere in the entire process, see Defrin et al. 2002; Marotta 
et al. 2015). Nonetheless, the subjects feel pain. Again, in the case of thermal 
grill illusion, pain imagery is not just one of the ingredients of pain. It is the 
only ingredient of pain. 

Further, Ramachandran’s famous mirror treatment of phantom limb pain 
is very easily explained in this conceptual framework. Ramachandran suc- 
cessfully treated many cases of phantom limb pain by making the patients 
place their hands (both the intact hand and the “phantom” hand) in a box, 
where they saw the movement of the intact hand reflected in a mirror exactly 
where the “phantom” hand was localized. The subjects were then asked to 
move the two hands simultaneously and this led to the alleviation of the phan- 
tom limb pain (see Ramachandran et al. 1995; Ramachandran and Rogers- 
Ramachandran 1996b; and for some more wrinkles, see Ramachandran and 
Rogers-Ramachandran 1996a; see also Giummarra and Moseley 2011 for a 
summary of under what circumstances this experiment could be replicated). 

If we accept the theoretical framework I am proposing, Ramachandran’s 
mirror treatment amounts to an early example of treating pain with the help 
of mental imagery. As we have seen, phantom limb pain would count as men- 
tal imagery: it involves the activation of the somatosensory cortices without 
any activity from nociceptors. And what happens in Ramachandran’s mirror 
experiments could be described in the following manner. The experiments 
triggered subjects’ tactile mental imagery of their phantom limb with the help 
of their visual input (of what appeared to be their phantom limb in the mir- 
ror). And this tactile mental imagery is what (causally) modified the pain 
imagery that is responsible for the phantom limb pain (see also MacIver et al. 
2008 and Beaumont et al. 2011 for different ways of using mental imagery in 
alleviating phantom limb pain; and Moseley et al. 2008 for more details on 
Ramachandran’s mirror technique). 

This mirror-induced mental imagery was not voluntarily triggered. It was 
not like closing one’s eyes, counting to three and then visualizing an apple. It 
was triggered involuntarily by the mirror-trick of visual stimulus—the visual 
stimulus of what appeared to be the visual perception of the phantom limb. 
So while Ramachandran’s mirror treatment of phantom limb pain is an early 
instance of treating pain with the help of mental imagery, it amounts to 
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treating pain with the help of not voluntarily conjured up visual or tactile 
imagery (like the experiments in Maclver et al. 2008; Fardo et al. 2015; and 
Volz et al. 2015), but rather involuntary crossmodally triggered mental 
imagery. This distinction will play an important role in Chapter 30, when dis- 
cussing the clinical applications of mental imagery. 

One might push back that these are marginal cases of pain perception: 
phantom limb pain is relatively rare (in comparison with non-phantom limb 
pain) and the thermal grid is just a clever illusion. Note, however, that much 
more widespread forms of pain perception also count as pain imagery. 

First of all, some forms of referred pain (pain in your knee caused by tissue 
damage in your thigh, for example) is a clear case of pain imagery: S1/S2 acti- 
vation indirectly triggered by nociception. The S1/S2 processing that corre- 
sponds to the referred location is triggered by the mediating S1/S2 processing 
that corresponds to the location of the actual nociception.’ 

Second, neuropathic pain also counts as pain imagery. Neuropathic pain is 
caused by nerve damage between the skin and the somatosensory cortices, 
which lead to chronic pain (which often amounts to a burning feeling), in the 
absence of nociception. Neuropathic pain is, by definition, pain imagery: the 
pain processing is not directly caused by nociception because there is no 
nociception. And given that much of chronic pain (most typically chronic 
low back pain) involves neuropathic pain (Baron et al. 2016), this means that, 
for example, chronic low back pain is a hybrid of nociception-driven pain 
perception and pain imagery. More generally, we have strong empirical rea- 
son to believe that nociception-driven pain processing and pain imagery 
interact at various stages of pain processing (Ploghaus et al. 2003; Koyama 
et al. 2005; Goffaux et al. 2007; Atlas and Wager 2012; Carlino et al. 2014). 

Third, the theoretical framework I am proposing can give a simple and uni- 
fied explanation for the significant body of evidence from neuroscience that 
suggests that pain is very much dependent on a number of contextual cues 
(Carlino et al. 2014; Ploghaus et al. 2003). Philosophers often make a distinc- 
tion between the sensory and the affective components of pain (see Aydede 
2009)—but there is plenty of evidence that both of these depend on contex- 
tual cues. 

A relatively well-understood case of this context dependence is the effects 
of placebo and nocebo on pain: placebo alleviates pain and nocebo does the 
opposite (Benedetti et al. 2005, 2007). More generally, pain very much 


? Note that not all referred pain works like this, sometimes it is the neural pathway leading from 
the nociceptor to S1/S2 that gets scrambled. 
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depends on our expectations (Koyama et al. 2005; Goffaux et al. 2007; Atlas 
and Wager 2012; see also Peerdeman et al. 2016 for a meta-analysis) and there 
is more and more data on the neural mechanism of this process (Ploghaus 
et al. 1999; Sawamoto et al. 2000; Jensen et al. 2003; Keltner et al. 2006). The 
crucial finding from our point of view is that the expectations of pain signifi- 
cantly involved sensory cortical areas (S1/S2), which interact with the proc- 
essing of pain input very early on in cortical processing (Porro et al. 2002; 
Wager et al. 2004). 

Hence, some expectations and anticipations of pain will count as pain 
imagery, as we have clear evidence that some expectations can activate $1/S2 
without any nociceptors being involved (Ploghaus et al. 1999; Sawamoto et al. 
2000; Porro et al. 2002; Wager et al. 2004; Keltner et al. 2006). It is important 
to emphasize that this does not mean that expectations and anticipations in 
general will have to be labeled as mental imagery. Many instances of expecta- 
tions and anticipations will not count as mental imagery—for example, if I 
make an appointment with my dentist for next month and I am anticipating 
the pain I will have to endure then, this will not count as an instance of pain 
imagery as long as the somatosensory cortical areas are not activated at all. 
But we have plenty of evidence that at least some expectations can activate the 
somatosensory cortical areas directly, without the involvement of nocicep- 
tors. These instances of expectations will count as pain imagery (see also 
Chapter 12 on expectations that count as mental imagery).° 

As we have seen, there are many studies that show that these expectations 
clearly involve early cortical activations (Ploghaus et al. 1999; Sawamoto et al. 
2000; Porro et al. 2002; Wager et al. 2004; Keltner et al. 2006). Our expecta- 
tions about the pain stimulus influence pain intensity as well as pain location 
and even the presence of pain (Ploghaus et al. 2003; Carlino et al. 2014; see 
also Peerdeman et al. 2016 for a summary). The general framework I argued 
for predicts these results: if pain perception is a mixture of nociceptor-driven 
perception and pain imagery, then, provided that some expectations count as 
mental imagery, we should predict complex and diverse interactions between 
expectations and pain. 

I argued that what we pre-theoretically consider to be pain is a mixture of 
nociceptor-driven processing and pain imagery. This view also has significant 
consequences for some philosophical debates about the nature and content of 


* Further, hypnosis induced pain will also count as pain imagery in this sense as it is not triggered 
by nociceptors, but the very same regions are active as in the case of nociceptor-driven pain process- 
ing (Derbyshire et al. 2004). Interestingly, the mental imagery here is conscious: the subjects under 
hypnosis do have painful phenomenology. 
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pain. One big question about pain is about its representational content: what 
it represents and how it represents whatever it represents. 

I will assume that pain is a representational state—by which I mean that 
pain processing involves representations. This is not a particularly controver- 
sial assumption. A more controversial assumption would be to claim that the 
phenomenology of pain could be fully explained in terms of the representa- 
tional content of pain. But I will not say anything about whether the represen- 
tational content would fully or partially (or not at all) explain the 
phenomenology of pain. If pain states are representational states, then the 
question is: what kind of representational states are they? 

‘There are two major proposals here, corresponding, roughly, to attributing 
mind-to-world direction of fit, or world-to-mind direction of fit, to pain. 
Some representations represent the world as being a certain way. 
Representations of this kind have a “mind-to-world” direction of fit. Some 
other representations, in contrast, have a “world-to-mind” direction of fit: 
they do not describe how the world is, but prescribe how the world is sup- 
posed to be. The question is, then: does pain have a mind-to-world or a 
world-to-mind direction of fit? Or, as the question is more often raised in the 
pain literature, does pain have indicative or imperative content? 

According to some, the content of pain states is indicative content: it rep- 
resents some states of affairs (standardly: the tissue damage or bodily distur- 
bance) in some way. It has a mind-to-world (or belief-like) direction of fit: it 
“describes” a state of affairs (Tye 1995; Cutter and Tye 2011; Bain 2013). 
According to others, the content of pain states is imperative content: it does 
not describe a state of affairs (or does not merely describe a state of affairs), but 
rather prescribes a course of action (standardly, that the agent sees to it that 
the bodily disturbance (or the pain experience) is gone). It has a world-to- 
mind (or desire-like) direction of fit (Klein 2007, 2012, 2015; Hall 2008; 
Martinez 2010; Klein and Martinez 2018; Barlassina and Hayward 2019).* 
‘These are the two major proposals, but there are many ways of substantiating 
both and many versions of both the indicative and the imperative theories of 
pain (and things are even more complicated as there are also hybrid views that 
posit imperative content for some components of pain and indicative content 
for others—for discussion, see Hall 2008; Martinez 2010; and Bain 2011). 


* I use “having indicative content” and “having content with mind-to-world direction of fit” inter- 
changeably (ditto for “having imperative content” and “having content with world-to-mind direction 
of fit”). I bracket some recent controversies about the usefulness and coherence of the term “direction 
of fit” in what follows (see Frost 2014). See Chapter 25 for more discussion of the concept of 
“direction of fit? 
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How does the conceptual framework I argued for above help with this 
debate? I want to argue that it poses some challenges for imperativism, but 
not for indicativism. I do not think that it provides a knock-down argument 
against imperativism, but it draws attention to a number of questions impera- 
tivism would need to answer. 

Very simply stated, if pain has indicative content there is no problem, as 
pain imagery also has indicative content: it represents some state of affairs 
in some way (see Chapter 25 and see also Searle 1983, pp. 13-14; but see 
Langland-Hassan 2015). But if pain has imperative content, then we get a ten- 
sion between the direction of fit of pain (world-to-mind) and the direction of 
fit of pain imagery (mind-to-world). Imperativists would need to say more 
about how these two claims could fit together (I explore some of the options 
available to the imperativists in Nanay 2017b—spoiler: none are too promising). 

In short, if what we pre-theoretically take to be pain perception is really a 
mixture between nociceptor-driven pain processing and pain imagery, then 
imperativists about pain would need to say much more about a number of 
details that are not made explicit in their account. And, as we have seen, 
indicativism has a straightforward way of accommodating the picture I 
argued for in this chapter, according to which pain perception is a mixture of 
nociception-driven perception and pain imagery. 


18 
Object Files 


Perceptual psychology makes a threefold distinction between low-level, 
mid-level and high-level perceptual representations. Low-level representations 
track features like contours, colors, and shapes. High-level representations are 
responsible for categorization and mid-level representations are somewhere 
in between (see Anderson 2020 on the concept of mid-level visual 
representations). 

How does the concept of mental imagery map onto this hierarchy? It would 
be tempting to identify mental imagery with low-level perceptual representa- 
tions, given the emphasis on early cortical processing. But the low-level/mid- 
level/high-level distinction is about what is represented, not about where in 
the perceptual system something is represented. In this chapter, I argue that 
the concept of mental imagery could help us to explain many of the crucial 
aspects of mid-level vision. 

Mid-level vision is neither about features, nor about categories. It is about 
“objects” (whatever that means; see Carey and Xu 2001; Gao and Scholl 2010; 
van Dam and Hommel 2010; Green 2018). Mid-level perceptual representa- 
tions are usually referred to as “object files.” 

It is difficult to overstate the crucial role the concept of object files has 
played in vision science of the last forty years. Object files are representations 
that sustain reference to external objects and keep track of the properties of 
these objects. The concept of object file, like many of our concepts about the 
mind, is metaphorical: the general idea is that, just like the police has a file for 
a criminal that they update with new information as new information about 
this criminal comes in, our perceptual system opens files about the objects we 
perceive and then keeps updating these in the light of incoming information. 

The concept of object file was introduced in a paper published in 1983 by 
Anne Treisman, Daniel Kahneman, and Jacquelyn Burkell. They defined 
object file as “the temporary representation in which the information that 
pertains to a particular object accumulates and is updated when the object 
changes” (Treisman et al. 1983, p. 531). An object file is “a temporary episodic 
representation, within which successive states of an object are linked and 
integrated” (Kahneman et al. 1992, p. 175). 
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A striking feature of these definitions is that it is not clear what makes these 
object files different from plain perceptual representations. And here many 
advocates of object files argue that in the case of object files, two representa- 
tional components are clearly distinguished: one representational component 
represents the object, without its properties. And the other represents the 
properties of this object (see, for example, Hommel 2004; Quilty-Dunn and 
Green forthcoming). Here is one typical statement of this requirement on 
object files: “the visual system need not know (i.e., need not have detected or 
encoded) any of their properties [that is, the properties of the objects] in 
order to implicitly treat them as though they were distinct and enduring 
visual tokens” (Pylyshyn 2009, p. 267). 

The main experimental reason for making this distinction comes from 
multiple object tracking. Here, the most important experiment shows that the 
visual system keeps track of a visual object and its properties even through 
disappearances that are due to occlusion (Flombaum and Scholl 2006; Yi et al. 
2008). So a red triangle is moving left, disappears behind the right side of a 
blue rectangle and then reappears on the left side of the rectangle (maybe 
even having changed color or shape). The visual system keeps track of the red 
triangle while it is behind the blue rectangle. But while doing so, its represen- 
tation of the object (the triangle) must be separate from the representation of 
its features (like the color red) as the visual system at that point represents 
only the color blue (as the triangle is fully occluded behind the blue rectan- 
gle). Thus, the argument would go, one part of the object file represents the 
object, the other represents its properties. 

In the light of the discussion in the preceding chapters, it should not come 
as a surprise that the representation of the red triangle behind the blue rec- 
tangle is mental imagery. It is early cortical perceptual representation not 
directly triggered by sensory input. It is not directly triggered by sensory 
input as the only sensory input at that moment comes from the blue rectan- 
gle. The representation of the red triangle is triggered, very much indirectly, 
by the earlier episode of the red triangle disappearing behind the blue 
occluder (see the discussion in Chapter 12 on temporal mental imagery). 

Further, we know from a variety of findings concerning amodal completion 
that the amodally completed visual object is represented in early visual cortices 
and most often already in V1 (see Chapter 8 and Nanay 2018b for a summary). 
So the occlusion findings are not just consistent with the hypothesis that object 
files would count as mental imagery, they even predict this claim. 

Could we then just conclude that object files constitute yet another percep- 
tual phenomenon that would count as mental imagery? Not so fast. Object 
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files are posited in order to explain a number of visual phenomena besides 
multiple object tracking. The most important two of these are object-specific 
preview benefit and trans-saccadic memory. And it is not obvious that mental 
imagery can do all the explanatory work in these two cases. I take them 
in turns. 

Seeing a stimulus makes us recognize the same (or even just related) stim- 
uli more quickly. But there is a more specific experimental setup that has 
played a key role in the object file literature. This experimental setup is called 
the “object reviewing paradigm?” You see two circles in different parts of the 
visual display (one at the top, one at the bottom) and the stimulus (a small 
picture or a letter) flashes briefly in one of the circles. Then the circles move to 
different parts of the display, for example, the one at the top moves to the left 
and the one at the bottom moves to the right. 

After this phase, the original stimulus (the small picture or the letter) is 
flashed again in one of the circles. We have no problem recognizing the origi- 
nal stimulus, regardless of which circle it was originally presented in. But if it 
was presented in the same circle before, we are quicker to do so. This phe- 
nomenon is called the “object-specific preview benefit.” So if the letter A was 
briefly presented in the top circle, which then moved to the left, then we are 
quicker if the letter A is presented in the circle on the left than if it is pre- 
sented in the circle on the right. 

In other words, the visual system somehow binds the letter A to the circle 
at the top and then keeps track of this connection as the circle moves around, 
so that when the A is flashed in the same circle again, it is quicker to recog- 
nize it. This is the experimental paradigm that led to the positing of object 
files. When we see the circle at the top and the circle at the bottom, our per- 
ceptual system opens two object files, one for the circle at the top and one for 
the circle at the bottom. When the letter A flashes in the circle at the top, this 
information is filed in the object file of the circle at the top. And it remains 
filed when this circle moves to a different location. As this A is still in the 
object file of this circle when it is on the left side of the display, when the letter 
A is flashed again, it is more easily and quickly recognized. This is the classic 
story about why we should posit object files. 

But there are a number of ambiguities in this story. Object files were pos- 
ited in order to explain the object-specific preview benefit phenomenon. 
Thus, it leaves open the question about just what kind of representations are 
the ones that bind the letter A to the circle at the top. The experiments show 
that there must be some kind of representation that binds the letter A to this 
circle (otherwise we wouldn't be quicker if the letter A flashed in that circle 
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and not the other one). But they do not specify what kind of representation 
this may be. 

Could this representation be mental imagery? One reason to resist this 
claim would be to point to findings that allegedly show that abstract, but, in 
any case, amodal (which, in this context means modality-independent) fea- 
tures need to be represented in these experiments (see Quilty-Dunn 2016, 
2020; Green and Quilty-Dunn 2021 for summaries, see also Echeverri 2016)." 
And as mental imagery can’t represent features of this kind, object files can't 
be mental imagery. In order to address this worry, I want to focus on some 
variations of the object-specific preview benefit experiments. 

The first experiment I want to talk about uses the same general setup as the 
object-specific preview benefit experiments above, but with a minor modifi- 
cation. You see the two circles at the beginning, and a picture of a cat is flashed 
in the circle at the top. Then the circle at the top moves to the left and the cir- 
cle at the bottom moves to the right, but instead of seeing the same cat picture 
again, you hear a meowing sound either from the left (where the circle at the 
top ended up) or from the right. And the finding is that subjects are quicker 
to identify the meowing sound as a match with the cat picture if it came from 
the left (Jordan et al. 2010; see also Zmigrod et al. 2009). The authors con- 
clude that object files (the representations that make this performance possi- 
ble) “store object-related information in an amodal format that can be flexibly 
accessed across senses” (Jordan et al. 2010, p. 500; again, amodal here means a 
format independent of any sense modality, it has nothing to do with amodal 
completion). 

Given that mental imagery does not store information in an amodal format 
and given that it cannot be flexibly accessed across senses, one might think of 
this experiment as showing that object files, whatever they are, would not 
count as mental imagery. 

But this would be too quick. We know from a vast amount of studies that 
sense modalities interact laterally at very early stages of perceptual process- 
ing. One nice experimental illustration of this point is the double flash illu- 
sion, which we have already encountered in Chapter 13: You are presented 
with one flash and two beeps simultaneously (Shams et al. 2000). So the sen- 
sory stimulation in the visual sense modality is one flash. But you experience 
two flashes and already in the primary visual cortex, two flashes are processed 


1 See also Beck (2015) for a way of resisting the claim that amodal (again, meaning modality- 
independent) representations must be propositionally structured. See also Phillips (2020) for some 
orthogonal issues with Green and Quilty-Dunn’s argument. 
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(Watkins et al. 2006). The processing of the two flashes is almost simultane- 
ous with the two beeps, which shows that this really is a lateral influence (that 
the auditory information does not go up the auditory hierarchy to form an 
amodal representation and then trickle down in the visual hierarchy to influ- 
ence the primary visual cortex). 

Here is another experiment I have already mentioned in Chapter 13, which 
is even more closely related to the Jordan et al. (2010) study. Subjects are 
blindfolded and they hear various familiar noises (sounds quite similar to the 
meow auditory stimulus and the ringing phone auditory stimulus used in the 
Jordan et al. (2010) study), while lying in the fMRI scanner. The important 
piece of finding is that the activation of the primary visual cortex of these 
subjects is very different, depending on what auditory stimulus they hear 
(Vetter et al. 2014). 

In the light of these findings (and, again, there are many of these, mapping 
out all the lateral early influences between all possible sense modalities, see 
Chapter 13), the Jordan et al. (2010) experiment can be explained in a much 
more straightforward manner. Only the cat image is stored. But when the 
subject hears the meowing sound from the left, this promptly activates the 
early visual cortices (as we have seen from the Vetter et al. (2014) study, even 
the primary visual cortex). And this cat content-specific activation of the pri- 
mary visual cortex latches onto the stored early cortical cat image representa- 
tion on the left, which explains the quicker reaction if the meowing sound 
comes from the left. No need to postulate anything over and above mental 
imagery. 

Another experiment that could be taken to indicate that mental imagery is 
not enough to explain all there is to be explained about the object-specific 
preview benefit modifies the same experimental setup slightly differently. You 
see the word “fish” flashed in the circle at the top, it moves to the left and then 
instead of the same word, an image of the fish is presented on the left. Even in 
this setup we get the object-specific preview benefit (Gordon and Irwin 2000). 

This might seem like a rock-solid argument that it must be something like 
the representation of the abstract concept fish that is triggered by seeing the 
word “fish” and this abstract concept is stored in the object file and makes it 
easier to recognize the fish image. And as mental imagery does not store 
abstract concepts, object files must be very different from mental imagery. So 
the argument would go. 

Again, there is a much simpler and more straightforward explanation. We 
know a lot about the ways in which linguistic labels change (and speed up) 
perceptual processes and we also know a fair amount about the timescale of 
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this influence. The crucial piece of finding, both from EEG and from eye 
tracking studies, is that linguistic labels influence shape recognition in less 
than 100 milliseconds (Boutonnet and Lupyan 2015; de Groot et al. 2016; 
Noorman et al. 2018; it should be acknowledged that in these experiments, 
the onset of the linguistic label preceded the onset of the shape to be recog- 
nized). This is a very similar time frame to how long it takes for the stimulus 
to reach V4 (Zamarashkina et al. 2020)—that is, extremely fast (note that 
word recognition does take significantly longer, see Hauk et al. 2012). 

Crucially, this time frame of less than 100 milliseconds is much, much 
shorter than the time that would be needed for perceptual processing to reach 
all the way up to higher-level representations and then trickle all the way 
down again to the primary visual cortex (see Thorpe et al. 1996 and Lamme 
and Roelfsema 2000 for the temporal unfolding of visual processing in uni- 
modal cases; and see Kringelbach et al. 2015 for a summary of the relative 
slowness of non-early cortical processing). 

To give you a comparison, we have seen that the amodal completion of 
simple shapes, like the Kanizsa triangle, is taken to be a lateral (and not a top- 
down) process on the basis of timing studies, although it happens slightly 
slower than 100 milliseconds (within 100-200 milliseconds of retinal stimu- 
lation (see Chapter 8)). If the 100-200 milliseconds timescale of amodal com- 
pletion can be explained in terms of lateral influence, without postulating 
higher-level representations that mediate this process, then the less than 100 
milliseconds of the influence of linguistic labelling can also be explained 
without postulating higher-level representations that mediate this process. 

One might wonder how this lateral influence works. Why does a meow 
trigger the image of the cat? And why does the group of squiggles that looks 
like the word “fish” trigger the image of a fish? Given the very short time 
frame, the most likely explanation would be some low-level form of associa- 
tion. But we need to make a distinction between different kinds of associa- 
tions. We know that if you encounter, say, salt and pepper together a lot, then 
being exposed to salt (or the word “salt”) can trigger the activation of the 
word or the image of pepper. The same goes for Tom and Jerry or Romeo and 
Juliet. I will call associations of this kind unbound associations. 

Contrast these with what I will call bound associations (for a similar dis- 
tinction, see Colzato et al. 2006; Hommel and Colzato 2009). If two properties 
are often co-instantiated in the same object or event, this creates a bound 
association. I use the term “bound” because if property P and property R have 
been bound to the same individual a lot before, then instantiating property P 
can lead to the representation of property R. The association between the 


OBJECT FILES 137 


shape of the banana and the color yellow is a bound association: we have seen 
these two properties co-instantiated in the very same object (a banana) a lot 
before. And, as a result, seeing a banana shape tends to trigger the representa- 
tion of the color yellow. The association between Tom and Jerry (or salt and 
pepper) is unbound association as these associated terms have not been 
bound to the same individual. We know from a number of empirical studies 
that bound and unbound associations work very differently (see Rappaport et 
al. 2013 for a summary). 

The kind of association that plays a role in the object-specific preview ben- 
efit experiments with the meowing sound and the word “cat” is bound associ- 
ation. We have experienced the auditory stimulus of meows and the visual 
stimulus of cats together a lot. But this is not what explains the effect. 
Crucially, the auditory property of meowing and the visual property of a cat 
shape have been bound together before a lot by our perceptual system. 
Similarly, we have also experienced the word “fish” and the image of a fish 
together a lot (starting with picture books for toddlers). But, again, this is not 
what explains the effect. What explains it is that these two properties have 
also been bound to the same individual before. So this influence from the 
word “fish” to the image of the fish does not need to be mediated by high-level 
semantic representations. Nor is it a common or garden semantic association. 
It is a bound association. This would also explain one of the earliest object- 
specific preview benefit studies, where the preview is a capital letter and the 
target is a lowercase letter (Gordon and Irwin 1996, experiment 3). Again, “A” 
and “a” have been bound to the same sound. 

It could be, and has been, argued (see, for example, Quilty-Dunn 2016, 
2020; Green and Quilty-Dunn 2021) that association can't be the right expla- 
nation, because no effect was found in yet another modification of the object- 
specific preview benefit experiment, where the preview and the target were 
words that were supposed to be linked by association. This study used 
words—for example, the words “doctor” and “nurse”—which are, according 
to the study, associatively linked (Gordon and Irwin 1996, experiment 4). The 
association between the words “doctor” and “nurse” is an unbound associa- 
tion. The association between a short word and an image that refers to the 
same thing and the association between a lowercase “a” and an uppercase “A” 
are, as we have seen, a bound association. The object-specific preview benefit 
is not sensitive to unbound associations, but it is very much sensitive to 
bound associations. 

Here is one last piece of evidence against any form of higher-level explana- 
tions of object-specific preview benefit. The authors whom the proponents of 
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these higher-level explanations like to cite did conduct a couple of experi- 
ments that undercut these higher-level explanations. In the first of these, the 
preview is a synonym of the target and in the second, the preview is a subcat- 
egory of the target (Gordon and Irwin 1996, experiments 5 and 6). So in the 
first experiment, the subject sees the word “cab” in the circle at the top and 
then the target stimulus is the word “taxi” In the second experiment, the pre- 
view in the circle at the top is a word, like “robin,” which is a subcategory of 
the more general concept that is used as the target stimulus, like “bird? If 
semantic information were coded in the object file, these experiments would 
result in object-specific preview benefit: if the amodal and abstract concept of 
cab or of robin were placed in the object file, it should make the recognition 
of the synonym of “cab” and the general category of “bird” quicker. But the 
findings show that there is no object-specific preview benefit in these cases. 
This indicates that higher-level representations are not involved in the object- 
specific preview benefit. 

To sum up, the object-specific preview benefit paradigm does not justify 
positing a new kind of representation over and above mental imagery (which 
is very much consistent with the well-demonstrated dissociation between 
conscious perception and object-specific preview benefit results; see Mitroff 
et al. 2005). 

The second perceptual phenomenon that is supposed to justify the positing 
of object files is trans-saccadic memory. When we visually explore the scene 
in front of us, we move our eyes. And each time we move our eyes, our visual 
system needs to remap the contents of our visual array. Suppose that you are 
looking at a fixation cross and there is a triangle at the extreme right-hand 
side of your periphery. When you move your eyes and fixate on the triangle, 
the triangle is in your fovea and the fixation cross is at the extreme left-hand 
side of your peripheral vision. 

The transition between these two states is not trivial. At first, the triangle 
showed up on the right-hand side of your visual field and then it was bang in 
the middle of it. But your visual system takes this triangle to be the same. So 
the visual system needs to somehow keep track of this triangle in a way that 
would assure that its identity is preserved across the radical change in which 
part of your visual field it shows up. 

This phenomenon is called trans-saccadic memory, because the visual sys- 
tem needs to remember from the beginning of the saccade to the end of the 
saccade what visual objects are where in the visual field. It needs to remember 
that the visual object that it saccades to, and that is about to appear in the 
middle of the visual field, is a triangle. In order to explain this phenomenon 
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of trans-saccadic memory, we do need to posit a representation that remains 
stable from one saccade to another. This is another important reason why 
people in the empirical literature posit object files. Here is a representative 
statement: “corrective saccades are executed on the basis of object files” (Schut 
et al. 2017, p. 138). The question is whether any new and distinctive represen- 
tations need to be posited to explain trans-saccadic memory over and above 
representations that we have reason to posit anyway, like mental imagery. 

It has been argued that trans-saccadic memory, just like object-specific pre- 
view benefit, involves higher-level representations that code for abstract fea- 
tures. I want to go through two key experiments that could be, and have been, 
taken to support this approach and point out that both are consistent with the 
claim that the only representation that is needed to account for trans-saccadic 
memory are mental imagery and other early cortical representations. 

The experimental paradigm that is used to examine trans-saccadic memory 
involves fixating on a cross at the middle of the screen and then presenting a 
stimulus at the periphery. When the subject saccades to this stimulus, the 
experimenter changes some features of this stimulus. The subject won't notice 
this change as it happens while the saccade takes place. The question is, how 
much this stimulus can change while still counting as the same object. This is 
tested, as in the case of the object-specific preview benefit experiments, by 
measuring the reaction time of naming the object in the periphery. If you see 
a cat picture in the periphery, and you saccade to it, you will be quite quick to 
name it if it remains the same cat picture. If the experimenters turn the cat 
picture into a picture of a book during the saccade and you need to name the 
object depicted after the saccade (you have to recognize it as a book), your 
reaction time will be slower. The general idea is that you open an object file 
before the saccade, and you update this very object file after the saccade. If the 
information after the saccade is very different (as it is when the cat picture is 
replaced by the book picture), this slows you down. 

Here is a twist in this experimental setup. A picture of the lowercase letter 
(“a”) is flashed in the periphery. Then you saccade to this picture, but during 
the saccade, it is replaced by the uppercase letter (“A”) (Rayner et al. 1980, see 
also Pollatsek et al. 1984). The lowercase letter sped up the recognition of the 
uppercase letter. This allegedly shows that the information stored in the object 
file is abstracted away from the specific low-level features of the letters. So, 
something like an abstract category of the grapheme is encoded in the 
object file. 

As in the case of object-specific preview benefit experiments, these higher- 
level explanations are unwarranted as a much simpler explanation is available 
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purely in terms of mental imagery. In fact, we can use a very similar explana- 
tory scheme as the one we used in the case of object-specific preview benefit 
to explain this effect. The only thing that gets encoded in trans-saccadic 
memory is the image of the lowercase letter (“a”). No abstract category gets 
encoded. But when the subject is presented with the uppercase letter (“A”), the 
old image makes it easier to recognize it as a result of the bounded association 
between the two. None of this requires any representation of any kind of 
high-level or abstract concept. 

The second experiment I need to mention used a slightly more complicated 
setup. The subject fixated on the cross at the middle of the scene and not one 
but two stimuli were presented in the periphery, quite close to one another 
(like the numbers 2 and 3 on a clockface). This was the pre-saccade ensemble 
of two stimuli. The subject then had to saccade to these and during the sac- 
cade these two stimuli were replaced by a single one in between the two old 
stimuli. The reaction time for naming this new stimulus was significantly 
lower if the new stimulus was the same as one of the old stimuli. So if a cat 
image and a book image were presented in the periphery and then during the 
saccade they were replaced with a single book image, the subject was quicker 
than if it was replaced by an image of a truck. 

In the case of this experiment, the crucial step comes from changing the 
color of this object. So when the image of a cat and the image of a book were 
replaced with a single image of a book, the color of this book image changed. 
The question was whether this color change had any effect on the reaction 
time. And the results were surprising in that in some cases it had an effect, in 
others it didn’t have any. It depended on whether the color that changed was 
what is referred to as “diagnostic color’—color that is exceptionally typical of 
the object kind in question. So if you see a yellow book, the yellow color is not 
diagnostic—it is not an exceptionally typical trait of books that they are 
yellow. But if you see a yellow banana, the yellow color is diagnostic—it is 
an exceptionally typical trait of bananas that they are yellow. Yellow is a 
diagnostic color of bananas, but not of books. 

In those experiments where the presented object in question had diagnos- 
tic color, changing this diagnostic color slowed down the reaction time in 
spite of the fact that the very same image was presented after the saccade (but 
colored differently). So seeing a yellow banana pre-saccade (as one of the two 
stimuli) and then seeing a blue banana post-saccade slowed down the reac- 
tion time. But the same slowing down did not happen if the object in question 
did not have a diagnostic color: seeing a yellow book pre-saccade (as one of 
the two stimuli) and then seeing a blue book post-saccade did not slow down 
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the reaction time. Again, the high-level explanation is that this experiment 
shows that the abstract category of “banana” is represented in trans-saccadic 
memory. I don't think this follows. The only thing that is represented in trans- 
saccadic memory is the image of the yellow banana. And when the blue 
banana is presented, it is more difficult to recognize given the ongoing repre- 
sentation of the yellow banana in trans-saccadic memory (as the yellow 
banana creates different visual expectations from a yellow book, involving the 
color yellow, and this expectation is frustrated by the blue banana, but not the 
blue book; see, for example, Rappaport et al. 2013 on how searching for 
bound features (like diagnostic color) is much quicker than searching for 
unbound features). None of this requires high-level abstract representations 
of banana-ness in trans-saccadic memory (see also Khayat et al. 2004, Supèr 
et al. 2004, Malik et al. 2015 for the ways in which trans-saccadic memory 
relies on the early visual cortices). 

Here is an analogy that might be helpful for explaining the role mental 
imagery plays in these two experimental paradigms. Imagine a ball that has a 
letter A painted on one side. The ball rolls on, and the letter A is no longer 
visible because it is now at the side of the ball that is facing away from us. 
We know from the amodal completion studies that it is still represented 
amodally—by means of mental imagery. The same goes for the object-specific 
preview benefit cases.” And, mutatis mutandis, for trans-saccadic memory. 

The general gist of my argument is that we should not underappreciate 
mental imagery. Mental imagery is intricately complex. And it can do all the 
jobs that object files were posited to do. 


> This effect can last for several seconds (Noles et al. 2005)—a result very much consistent with new 
findings about the early visual cortices (Fritsche et al. 2022). More generally, the similarity in the 
mechanisms of amodal completion and the persistence of object files is further emphasized by the 
experiments in Yi et al. (2008) and Flombaum and Scholl (2006). 


PART IV 
COGNITION 
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Language 


Mental imagery is a perceptual phenomenon, but it has important uses in 
post-perceptual processing and in our cognition in general. Part II and Part 
III of the book were about the important role mental imagery plays in percep- 
tion. This part takes a broader perspective and considers the various roles 
mental imagery plays in cognition. 

Throughout the history of philosophy imagistic mental representations 
have been routinely contrasted with abstract, linguistic representations (see 
Yolton 1996 for a summary). The background assumption is that there is a 
sharp contrast between two different kinds of mental representations: imagis- 
tic ones, like mental imagery and abstract, linguistic ones. There is imagistic 
cognition and there is linguistic cognition and they are very different. So 
when we talk about the importance of mental imagery in human cognition, 
the reach of mental imagery is limited as there is an extra layer of mental 
representations, abstract, linguistic ones, which have nothing to do with men- 
tal imagery. 

Just how far imagery reaches and how thick this layer of abstract linguistic 
representations is supposed to be is subject to debate. One way of thinking 
about the mind in general and mental representations in particular is to 
model it on language. We have a relatively clear idea about how language rep- 
resents. So a tempting route would be to use that as a means to describe the 
way the mind represents as well. This way of thinking about the mind was 
very influential in philosophy in the 1960s and 1970s, when almost all the 
most influential philosophers of mind came from a philosophy of language 
background. 

This way of thinking about the mind takes propositionally structured, 
language-like representations to be the default form of representations in 
general and mental representations in particular. So the mind represents by 
means of propositionally structured, language-like representations. Beliefs 
are propositional attitudes, so they do exactly this—as do desires. This way of 
thinking about mental representations either flat out denies that the mind 
could represent in any other way (which would make perception and imagery 
either propositionally structured or not a representation) or when it allows 


Mental Imagery: Philosophy, Psychology, Neuroscience. Bence Nanay, Oxford University Press. © Bence Nanay 2023. 
DOI: 10.1093/0s0/97801 98809500.003.0019 


146 MENTAL IMAGERY: PHILOSOPHY, PSYCHOLOGY, NEUROSCIENCE 


for the existence of non-propositional representations, say, perception or 
mental imagery, it downplays their importance. This may be one of the rea- 
sons why the obsessive emphasis on language at the middle of the twentieth 
century sidelined the philosophical study of mental imagery. 

But here is another way of thinking about the mind. The human mind is 
not that different from animal minds. In any case, it has evolved from animal 
minds, so, in order to understand the exquisite complexity of the human 
mind, we should start with understanding something simpler: the way the 
animal mind represents. Once we have fully understood that, we can then, 
and only then address the uniquely human fancy features, like language. 

The way animals (at least mammals) perceive is very similar to the way we 
perceive. And the way animals (again, at least mammals) exercise their men- 
tal imagery is also very similar to the way we do so. So the default for under- 
standing how the human mind represents should not be propositionally 
structured linguistic representation, but rather imagistic representation of the 
kind that perception and imagery uses. When we have fully understood how 
imagistic representations work, how they interact with each other, and how 
they lead to action, then and only then can we begin to address the fancy 
gloss on top of this fundamental representational machinery, which is 
uniquely human (Nanay 2021d). 

I once described the uniquely human features of the human mind, like lan- 
guage, as the icing on the cake (Nanay 2013a): when we try to analyze the 
cake, we should not start with the examination of the icing, we should begin 
with the cake itself. Understanding various features of the icing is a nice extra 
perk. But if you make inferences about the cake itself from what you know 
about the icing, you'll get it all wrong. 

While I have always sided with this second way of thinking about the mind, 
and I still do, I think we have very strong reason to question the strict opposi- 
tion of the imagistic and propositional parts of the mind: of the cake and the 
icing. Not only is the icing of language a very minor part of understanding the 
mind. It also relies heavily on imagistic representations, so much so that lan- 
guage processing cannot be fully understood without understanding mental 
imagery. To use the icing and the cake analogy one last time, the icing, it turns 
out, is made of many of the same ingredients as the cake itself. This is not 
particularly surprising: the fancy gloss of uniquely human mental capacities 
like language processing has evolved from the imagistic animal mind, so these 
capacities had to use the ingredients that were already present. They had to 
have imagistic representations as their starting point. 
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In some sense, this is even worse news for the friends of using language as a 
means to understand the mind. The way the mind processes language relies 
heavily on mental imagery. So, taking the way language represents as a start- 
ing point won't help us to understand the vast majority of our (non-linguistic) 
mental representations. But it won't help us to fully understand language pro- 
cessing itself either. 

I don't take this way of thinking about the mind to be particularly radical 
or extreme. Tyler Burge famously said that “representation of physical entities 
in language and thought is the way it is largely because representation in per- 
ception is the way it is” (Burge 2009, p. 293). David Kaplan also says some- 
thing very similar when he writes: 


Many of our beliefs have the form: “The color of her hair is 


” or “The 
» 


, where the blanks are filled with images, 


song he was singing went 
sensory impressions, or what have you, but certainly not words. If we cannot 
even say it with words but have to paint it or sing it, we certainly cannot 
believe it with words. (Kaplan 1968, p. 208) 


My claim is that this reliance of language on imagistic representations is not 
an exception, it is the rule.’ We now know that language processing is not 
completely detachable from imagistic cognition. Both generating linguistic 
utterances and hearing/reading them utilizes mental imagery. Some of the 
empirical findings supporting these claims come from neuroimaging. 
Describing a scene relies on our ability to generate mental imagery—early 
cortical representations not directly triggered by sensory input (Mar 2004; 
Zadbood et al. 2017). Further, understanding a description invariably involves 
mental imagery—again, not necessarily conscious mental imagery, but early 
cortical representations not directly triggered by sensory input. The crucial 
finding here is that it is this imagistic representation that is remembered, not 
the words we heard (Zwaan and Radvansky 1998; Zwaan 2016; Zacks et al. 
2018; McClelland et al. 2019), which shows that mental imagery is not a mere 
byproduct of language processing, but is an important ingredient thereof. 

We understand fairly well how this happens. As we have seen in Chapter 18, 
linguistic labels change (and speed up) perceptual processes and both EEG 


1 An emerging body of findings about iconic representations in language itself (especially in lan- 
guages other than Indo-European ones and in sign language) gives further support to this view; see 
for example, Perniss et al. (2010) and Schlenker (2017). 
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and eye tracking studies show that linguistic labels influence shape recognition 
in less than 100 milliseconds (Boutonnet and Lupyan 2015; de Groot et al. 
2016; Noorman et al. 2018). This means that linguistic and imagistic repre- 
sentations interact at an extremely early stage of perceptual processing—by 
any account in early cortical processing. Again, all this indicates that imagis- 
tic and linguistic cognition are far from being independent from one 
another—they are deeply intertwined even at the earliest levels of perceptual 
processing (see also Seydell-Greenwald et al. 2020). 

While many of these neuroimaging and timing results are relatively new, 
the intimate connections between linguistic and imagistic representations 
have long been postulated in behavioral studies. 

Probably the most famous of these go under the heading of “dual coding 
theory” (Paivio 1971, 1986; Just et al. 2004). According to the dual coding 
theory, linguistic representations themselves are partly constituted (or at least 
necessarily accompanied) by mental imagery and this explains why concrete 
words (that are accompanied by more determinate mental imagery) are easier 
to recall than abstract words (that are accompanied by less determinate and in 
some cases very indeterminate mental imagery). 

Dual coding theory started with studies of the cognitive underpinnings of 
mnemotic abilities—the reasons why some people are better at remembering 
words than others. And while the ways in which remembering words cor- 
relates with mental imagery capacities were studied, it turned out that some 
words are systematically more difficult to remember than others. The exam- 
ination of a vast dataset of words as well as a vast number of subjects shows 
that there is a correlation between how easily a word is remembered and how 
abstract/concrete it is. Abstract words like “homology” are more difficult to 
remember and concrete words like “homeowner” are easier to remember, 
even if we control for the frequency of occurrence in language. And dual cod- 
ing theory explains this difference in terms of the reliance of language pro- 
cessing on imagery: concrete words are remembered more easily because they 
their processing involves concrete mental imagery, which makes it easier to 
remember. 

Paivios dual coding theory posited the importance of mental imagery in 
linguistic processing to explain the behavioral differences between the recall 
of concrete and abstract words. But the findings of the dual coding theory are 
exactly what we should expect, given the more recent findings about the auto- 
matic and lateral triggering of early cortical representations in early stages of 
language processing. These older behavioral results and the more recent tim- 
ing and neuroimaging findings paint the same picture: language processing 
itself essentially involves mental imagery (see also Calzavarini 2019 for a 
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nuanced analysis of this connection and Liu 2022 on the important role mental 
imagery plays in polysemy processing). 

While nothing of what I have said in this chapter so far is particularly con- 
troversial, I want to close with a consequence of this general picture of the 
relation between imagistic and linguistic representation, which is more con- 
troversial. It concerns one of the most widely researched psychological phe- 
nomena: the Stroop-effect—(see Stroop 1935; see also MacLeod 1991 for a 
historical summary). 

The Stroop effect has been used in many branches and paradigms of psy- 
chological research, including ones I have touched on in this book, like the 
study of synesthesia, where the real hallmark of synesthetic experience is that 
it shows the Stroop effect—or, as it is often put, it Stroops (not many experi- 
ments get to be used as verbs). 

The classic Stroop task is very simple: you have to name the color of words 
printed on a page. If these words are color words (like “red” or “blue”), where 
the color named and the color it is printed in are different (say, “red” printed 
in blue), the reaction time increases significantly. 

What explains this odd difference? There are two major explanations, the 
first one dominant in the second half of the twentieth century, the second 
dominant in the last twenty years. According to the first one, the Stroop effect 
is about attention capture. The linguistic stimulus captures our attention, and 
as a consequence, less attention remains for the processing of the color stimu- 
lus (see MacLeod 1991 for a summary). According to the second one, the 
Stroop effect is about conflict monitoring and control: there are control mecha- 
nisms that detect the conflict between the linguistic and the color stimulus and 
they prioritize the processing of the language stimulus (Botvinick et al. 2001). 

The attention account and the conflict monitoring account of the Stroop 
effect are very different inasmuch as the former gives a fully bottom-up expla- 
nation, whereas the latter a top-down one in terms of the effect of the seman- 
tic meaning of the word on the processing of color. But they share an 
important premise, namely, that the Stroop effect is about access to motor 
control. Depending on whether the word “red” is printed in red or blue, our 
access to the motor control (of reading the word) is different and this explains 
the difference in our reaction time. This is clear enough in the attention 
account, but it is also what is behind the conflict monitoring account, where 
“conflict may be operationally defined as the simultaneous activation of 
incompatible representations [...] e.g., representations of alternative responses” 
(Botvinick et al. 2001, p. 630). 

I will argue that the connection between language processing and mental 
imagery suggests another possible explanation: reading the color word 
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triggers—laterally and automatically—visual imagery of the color and this 
interferes with the perceived color of the word. 

The conflict between the color and the meaning of the word starts much 
earlier than motor control. Here is an experiment that supports this hypothe- 
sis directly (there may be some indirect support from findings about the 
Stroop effect for color-related words as well (like “sky” (for blue) and “fire” 
(for red))—see Dairymple-Alford 1972). A recent experiment shows that 
even if we control for all the attentional and other mechanisms that deter- 
mine motor control, the activation patterns in V4—the part of the visual cor- 
tex that is responsible for color processing—would be difficult to explain 
unless we posit early sensory involvement in the Stroop effect (Purmann and 
Pollmann 2015). 

Given that V4/V8 is devoted (mainly) to color processing, it is active 
throughout any color Stroop task. More generally, the involvement of V4 in 
the Stroop task is somewhat difficult to examine experimentally given that 
without the functioning of these regions, the effect goes away. So some tricks 
are required to gain any insight into exactly how early cortical color process- 
ing is involved in the Stroop task. The experimenters examined the ways in 
which the previous trial in a series of Stroop tasks influences the current trial 
(Purmann and Pollmann 2015). So the question they raised is how your early 
sensory cortices behave, depending on the order of these trials. If you read 
the word “red” printed in blue, there is a conflict—it’s an “incongruent trial” 
If you read the word “blue” printed in blue, there is no conflict—it is referred 
to as a “congruent trial” 

The question is whether early sensory processing is different depending on 
whether an incongruent trial was preceded by another incongruent trial. And 
what the results show is that activities in V4 are very different, depending on 
whether the previous trial was congruent or incongruent. Interestingly, the 
same effect was not observed in language processing regions of the brain, 
only in V4. If we take the Stroop task to be about motor control, these results 
make no sense. But if, as I am suggesting, it is at least partly about sensory 
processing, these results are exactly what we should expect. 

The color of the word activates V4 bottom-up (that’s perception). And the 
reading of the word activates V4 laterally and automatically (that’s mental 
imagery). And the processing of the perceived color is slowed down because 
of the interference of the mental imagery. In short, the conflict between the 
color and the meaning of the word starts already in perceptual processing. 

But language is not the only mental capacity that is deeply intertwined with 
mental imagery. The following chapters outline how memory, imagination, 
and emotions all have more to do with mental imagery than is usually assumed. 
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Memory 


The term “memory” is used to refer to a wide and very heterogeneous variety 
of mental phenomena. We have already encountered two, which show the 
diversity of memory nicely. The first one was trans-saccadic memory, the 
early cortical representation that maintains perceptually represented features 
across saccades. And the second one was episodic memory, say, remembering 
what you had for breakfast this morning (Bernecker 2010; De Brigard 2014; 
Michaelian 2016; Hopkins 2018). I argued on the basis of experimental results 
that both trans-saccadic memory and episodic memory necessarily rely on 
mental imagery (in Chapters 18 and 5, respectively). 

In this chapter, as well as in Chapter 21, I want to examine three mental 
phenomena under the umbrella of the general category of memory, which all 
fall somewhere between the very simple low-level trans-saccadic memory 
and the very complex high-level episodic memory: Visual working memory, 
the Sperling experiments and boundary extension." I will discuss the first two 
in this chapter, saving boundary extension for Chapter 21. 

First, visual working memory is a behavioral category: it is the representa- 
tion that is posited in order to explain how we manage to reidentify perceptu- 
ally represented features after the input is gone. A standard setup for the study 
of visual working memory is a simple display of, say, three small squares of 
different colors, presented for a short time, followed by a couple of seconds of 
blank screen (or masking), which is then followed by a slightly different dis- 
play, where the color of one of the squares has changed (see, for example, 
Luck and Vogel 2013). If the subject can identify this change, what allows 
them to do so is the visual working memory. 

This sounds very much like past-oriented temporal mental imagery (Keogh 
and Pearson 2011; Tong 2013). And the connection between visual working 
memory and temporal mental imagery is further strengthened by studies that 


! There are other interesting connections between memory and mental imagery that I will only 
briefly mention here. Imagery training improves memory (in fact, findings along these lines sparked 
the revival of research into mental imagery in the 1960s; see Luria 1960, 1968; Yates 1966). Another 
important set of findings is about how vivid imagery can lead to misremembering or the modification 
of one’s memories (Gonsalves et al. 2004; Stephan-Otto et al. 2017a, 2017b). 


Mental Imagery: Philosophy, Psychology, Neuroscience. Bence Nanay, Oxford University Press. © Bence Nanay 2023. 
DOI: 10.1093/0s0/97801 98809500.003.0020 


152 MENTAL IMAGERY: PHILOSOPHY, PSYCHOLOGY, NEUROSCIENCE 


show strong interference effects between visual working memory and mental 
imagery (Hyun and Luck 2007). If, instead of the blank screen, a mental 
imagery-involving task is presented between the two visual displays (say, a 
mental rotation task), the subjects’ performance on the reidentification task is 
much worse. This is presumably because the mental imagery that is required 
for the mental rotation task interferes with the representation that allows us 
to reidentify the squares. Thus, the representation that allows us to reidentify 
the squares is either mental imagery or at the very least uses the same 
resources, which explains why one representation interferes with the other. 

Can we then just conclude that visual working memory is a form of mental 
imagery (just like trans-saccadic memory)? I don't think so. Some (in fact, 
most) instances of visual working memory would indeed qualify as a form of 
mental imagery. But there are some relatively recent findings about visual 
working memory that make the connection between visual working memory 
and mental imagery even more significant. 

It has been shown that the representation that allows us to successfully 
reidentify stimuli (that is, visual working memory) does not have to involve 
V1 activation, or even any early cortical activation (Rose et al. 2016; Sprague 
et al. 2016; Trubutschek et al. 2017; Wolff et al. 2017). More generally, there 
seems to be remarkable interpersonal variation in people's use of visual work- 
ing memory, often corresponding to the vividness of subjects’ imagery 
(Pearson and Keogh 2019; see also Jacobs et al. 2017). 

These results seem to show that while some instances of visual working 
memory in some subjects do indeed amount to mental imagery—which also 
explains the influence of visual working memory in amodal completion (Lee 
and Vecera 2005, 2010) and on visual processing more generally (Tend and 
Kravitz 2019), other instances of visual working memory in other subjects are 
very different. So visual working memory is a heterogeneous category, which 
is hardly surprising given that it is defined in very broad behavioral terms: the 
representation that allows us to reidentify stimuli. Mental imagery, in con- 
trast, is, in spite of the wide variety of phenomena that fall under it, less het- 
erogeneous in terms of its implementation. In this sense, mental imagery is 
more of a natural kind than visual working memory (see also Pearson and 


> Because of these radical variations in how visual working memory works (Pearson and Keogh 
2019, see also Gomez-Lavin 2021), I set aside debates about the format and the allegedly holistic 
nature of visual working memory, as well as its relation to iconic memory, for the purposes of this 
discussion (but see Burns 1987; Fougnie and Alvarez 2011; Gross and Flombaum 2017; Wang et al. 
2017; Pratte 2018). 
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Keogh 2019 for a similar point, as well as Gomez-Lavin 2021 for a related 
skeptical argument about working memory in general). 

The second topic concerning memory that I will discuss in this chapter, 
besides visual working memory, is the Sperling experiment. The discussion 
about the role of attention in mental imagery in general, and in multimodal 
mental imagery in particular, has a special perk: it can help us to understand 
what is going on in one of the most widely discussed experimental findings in 
philosophy of mind: the Sperling experiment (Sperling 1960; Averbach and 
Sperling 1961). 

The experiment is simple and it involves (although this is rarely stressed) 
some crossmodal effects. The subjects are presented with an array of twelve 
letters arranged in three rows of four letters. This is presented for a brief 
period and subjects could recall three or four of these letters. But here is a 
slightly different version of the same setup. When this visual display is over, it 
is followed by an auditory cue (low, medium, or high pitch—subjects are 
trained to associate these with the top, medium, and bottom row), which 
indicates the row the subjects should attend to. And now the subjects can 
recall three or four letters of the indicated row. This is a much higher recall 
rate (per row) than without the post hoc auditory cue. So it seems that we can 
represent the stimulus of all twelve letters (or almost all of them) even after 
we cease to see them. 

Philosophers love this. The most famous use of the Sperling experiments 
concerns the debate about phenomenal and access consciousness. And the 
argument here is that the Sperling experiments show that phenomenology 
overflows access: there are stimuli we are phenomenally conscious of but that 
we cannot access (Block 1995, 2011; see also Dretske 2006; Tye 2006).* The 
second experiment shows that at the time of the auditory cue, we must be 
phenomenally conscious of at least three letters per row, otherwise the cue 
couldn't have the effect it has. So that’s at least nine letters in the matrix. But, 
as the first experiment shows, we can only recall, so we only have access to, 
three or four of these letters. So phenomenology overflows access (and at a 
rate of nine to three/four!). And this would then show that phenomenology 
could not be reduced to the functional notion of access. 

This argument has been criticized in many ways (Dehaene et al. 2006; 
Kouider et al. 2010; Cohen and Dennett 2011; Carruthers 2017; see also 


> I do not mean to suggest that Block, Dretske, and Tye are all in agreement about the interpreta- 
tion of the Sperling findings—see Nanay (2009c) for the differences between Dretske and Tye in this 
respect. 
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Phillips 2011a for a very good overview of the debate). My aim here is to step 
back and instead of arguing about what follows from the Sperling experi- 
ments for theories of consciousness, focus on what the Sperling experiments 
show that would have any relevance in this debate at all. And I aim to point 
out that the Sperling experiments do not demonstrate anything we should not 
expect on the basis of the account of (multimodal) mental imagery I out- 
lined here. 

We know that the retina remains activated (relatively) long after direct reti- 
nal stimulation ceases. A very salient case of this that we have encountered 
before is afterimage: “an image seen immediately after the intense stimulation 
of the eye by light has ceased” (Gregory 1987, p. 13). Afterimages are con- 
scious and they only follow very intense sensory stimulation. But we also 
know that the primary visual cortex also remains activated (relatively) long 
after retinal stimulation (direct or delayed) ceases. 

Crucially, we have strong empirical reasons to think that the retinotopic 
activation of the primary visual cortex remains present for 200-300 millisec- 
onds after the sensory stimulation stops (Rolls and Tovee 1994; see also 
Nikoli¢ et al. 2009). And this was following a very brief presentation of the 
stimulus (for only 16 milliseconds). Newer findings show a 70-100 millisec- 
ond echo in the primary visual cortex (Teeuwen et al. 2021; but see also Sligte 
et al. 2008, which complicates this picture). This cortical process, according to 
my definition, would count as mental imagery: it is perceptual processing 
(and very early perceptual processing) not directly triggered by sensory stim- 
ulation (because the perceptual processing follows the stimulus presentation 
with significant delay (much more than the usual 30 milliseconds)). To put it 
in terms of the terminology used in Chapter 12, the temporal correspondence 
is missing. 

As we have seen in Chapter 12, V1 representation is about 30 milliseconds 
behind sensory input and movement representation in MT is about 45 milli- 
seconds behind. The 200-300 millisecond delay (but even the 70-100 milli- 
second delay) is significantly more than this—this is why this representation 
counts as temporal mental imagery. 

So each time we have sensory stimulation, we also have a 200-300 millisec- 
ond long mental imagery echo of it. This is in itself a very important fact 
about the role mental imagery plays in everyday perception, but what I want 
to focus on is how these findings can help us to understand what is going on 
in the Sperling experiments. 

When the stimulus presentation of the twelve letters is over, we now know 
that the subjects have mental imagery of the twelve letters and continue to do 
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so for 200-300 milliseconds. And here the auditorily cued attention operates 
on this mental imagery exactly the way it operates on any kind of mental 
imagery, something I discussed at length in Chapter 10. There I talked about 
visualizing the house you grew up in and shifting your attention from the 
color of the taps to the shape of the kitchen sink—whatever you are attending 
to will become more salient, and what is not in the focus of your attention 
(like the shape of the kitchen sink when you are attending to the taps) is likely 
to remain less salient. And the same goes for the mental imagery of the twelve 
letters following stimulus presentation in the Sperling experiments: we can 
attend to the top row, in which case the letters there will become more salient 
and the letters in the bottom row less salient. And vice versa. If we are not 
attending to any of the rows in particular, the salience is distributed across 
the rows. 

There is, of course, one slight difference between the mental imagery that 
subjects have after stimulus presentation in the Sperling experiments and the 
mental imagery one has when visualizing one’s house. The latter is conscious 
and the former is very unlikely to be conscious. But, as we have seen, this is 
no reason to dismiss either of them as not really a case of mental imagery. 
And given that both mental imagery and attention can be unconscious (for 
the latter claim, see Kentridge et al. 1999, 2008; Jiang et al. 2006; Cohen et al. 
2012), just as we can move around our conscious attention in our conscious 
mental imagery, we can also move around (maybe not voluntarily, but as a 
result of being cued) our unconscious attention in our unconscious mental 
imagery (see also Sergent et al. 2011 for direct evidence of how attention 
influences processing in the primary visual cortex after the stimulus is gone). 

A strong reason to think that subjects have mental imagery of the twelve 
letters in the Sperling experiments is the fragility of the Sperling results when 
it comes to timing. The effect collapses if the time gap between the offset of 
the stimulus and the auditory cue is significantly longer than 300 millisec- 
onds (Averbach and Sperling 1961, figure 17; see also Di Lollo 1977; Coltheart 
1980), which is exactly the time that the mental imagery following sensory 
stimulation is supposed to last for (see Rolls and Tovee 1994; Nikolić et al. 
2009). The Sperling results can be triggered only as long as the mental imagery 
of the twelve letters is present. 

If we think of this representation of the twelve letters as mental imagery, 
the Sperling results are exactly what we should expect: those aspects of the 
mental imagery we attend to are more likely to become conscious than those 
that we are not attending to—as we have seen in the case of the taps and the 
kitchen sink. But then what we should expect in cases where we are cued to 
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attend to some aspects of our mental imagery is that these aspects are more 
likely to become conscious and that’s exactly what the Sperling experiment 
shows. In short, the Sperling results should not surprise anyone: of course 
those aspects of our mental imagery that we are attending to are more likely 
to become conscious than those we are not attending to. 

How does this emphasis on mental imagery in the Sperling experiments 
change the philosophical import of these findings? There seems to be an 
agreement between the proponents and the opponents of the overflow argu- 
ment about the presence of some kind of representation of the twelve letters 
at the moment of the auditory cue (see, for example, Coltheart 1980, p. 184; 
Block 2011; Phillips 2011a, section 5). The question is whether these repre- 
sentations are phenomenally conscious. The proponents of the overflow argu- 
ment say they are, the opponents say they are not. 

In the light of the discussion in this chapter, it should be clear that the 
assumption that these representations are phenomenally conscious is unsup- 
ported, as activation of the primary visual cortex (which is all we can assume 
here) in no way guarantees any kind of conscious awareness. In other words, 
my view is in sharp opposition with the overflow view of the Sperling experi- 
ment (according to which the representation of the twelve letters is conscious) 
and in broad agreement with that of Phillips (2011a) (according to which it is 
unconscious). An advantage of my view over Phillips's is that while Phillips 
needs to posit such unconscious representation solely in order to explain the 
Sperling findings, we have independent and very strong reasons to posit men- 
tal imagery—(normally unconscious) early cortical representations that are 
not directly triggered by sensory input. 

In this chapter, I discussed two memory phenomena that can be better 
explained if we appeal to mental imagery. The next chapter is about a third 
one, where mental imagery plays an even more important role. 


21 


Boundary Extension 


The third of the three memory phenomena I want to discuss is boundary 
extension (see Nanay 2021e for a longer version of this argument). I am 
devoting a full chapter to it because it is a nice illustration of how focusing on 
mental imagery can help us make progress in contested empirical and con- 
ceptual questions. 

Look at Figure 12A. Now look away, do something else and try to remem- 
ber the image. What you will recall is more similar to Figure 12B. When we 
remember a scene, we remember more than what we saw. Literally more: the 
scene’s boundaries are wider than the boundaries of the scene we saw. This 
phenomenon is called boundary extension. 

Boundary extension is one of the most robust psychological findings about 
memory. It holds across age groups (Seamon et al. 2002); experimental meth- 
ods (drawing from memory vs. picking a picture that matches our memory, 
but see Bainbridge and Baker 2020); length of exposure time (how long we 
are looking at the scene, see Intraub et al. 1996); length of time gap (between 
seeing the scene and recollecting it, see Intraub et al. 2008); depictive style 
(photos vs. drawings, Gagnier and Intraub 2012); image content (Candel et al. 
2003); and so on (see Hubbard et al. 2010 for a very thorough summary of 
these findings; and Bainbridge and Baker 2020 for some recent findings that 
complicate this picture somewhat). 

In the philosophy of memory, boundary extension is used as an example of 
the constructive nature of memory. This fits into a wider set of findings about 
memory that all seem to demonstrate that memory formation is not a matter 
of copying perception into our memory. Memories are, rather, constructed on 
the basis of the scene we see, but their content is not determined by the scene 
seen (De Brigard 2014; Michaelian 2016; McCarroll 2018; see also Robins 
2019, McCarroll et al. forthcoming). 

This raises an important philosophical question about the boundary exten- 
sion findings (Michaelian 2011; De Brigard 2014; Bernecker 2017; Arango- 
Munoz and Bermudez 2018; Fernandez 2019, esp. pp. 196-8). Is boundary 
extension explained by perceptual adjustment or by adjustment during mem- 
ory encoding? Again, look at Figure 12A. Figure 12A is the stimulus. On the 
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Figure 12A What you saw 


Figure 12B What you recall 
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basis of this, you have a perceptual experience. Later, you recollect what you 
saw, and this leads to a memory image of Figure 12B. On the face of it, there 
are two possible ways this can happen: 


(i) Stimulus (Figure 12A) — Perceptual experience (Figure 12A) —> 
Memory (Figure 12B) 

(ii) Stimulus (Figure 12A) — Perceptual experience (Figure 12B) —> 
Memory (Figure 12B) 


According to (i), boundary extension is a form of adjustment during mem- 
ory encoding. The stimulus of Figure 12A leads to the perceptual experience 
of Figure 12A, just like it should. Perception is veridical. The adjustment hap- 
pens when Figure 12A is encoded in memory—the memory encodes 
Figure 12B, rather than Figure 12A. As Kourken Michaelian says, “the repre- 
sentation of the scene is modified automatically as a memory of this scene is 
formed” (Michaelian 2011, p. 326). 

According to (ii), in contrast, boundary extension is a form of perceptual 
adjustment. There is no adjustment during memory encoding: the perceptual 
experience of Figure 12B leads to the memory of Figure 12B. But when we 
look at Figure 12A, we experience Figure 12B. The reason for this might be 
that when we look at Figure 12A, we supplement the stimulus with our own 
“perceptual schema” and this leads to the experience of Figure 12B (Intraub 
and Richardson 1989; Intraub 2002, 2012). 

My aim is to propose a third explanatory scheme, according to which the 
extended boundary of the original scene is represented by means of mental 
imagery. And given the similarities between perception and mental imagery, 
the memory system encodes both the part of the scene that is represented 
perceptually and the part of the scene that is represented by means of mental 
imagery. This means that boundary extension is neither perceptual adjust- 
ment nor memory adjustment. 

My claim is that boundary extension is a two-step process and we have 
plenty of empirical evidence for how both of these steps work. I propose the 
following scheme instead of (i) and (ii) above: 


(iii) Stimulus (Figure 12A) — Perceptual experience (Figure 12A) —> Mixed 
perception/imagery experience (Figure 12B) > Memory (Figure 12B) 


The first step is that looking at a picture activates early cortical representa- 
tions of the space immediately outside the boundaries of the picture. Note 
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that this is a different and much weaker claim than the perceptualist’s 
explanation (along the lines of (ii)) that we do have perceptual experience 
of the space beyond the picture boundaries. But this space is nonetheless 
represented by the visual system. 

What is important from our point of view is that the early visual cortices 
represent the missing parts of the scene that fall just beyond the picture 
boundary (Chadwick et al. 2013). This early cortical representation of the 
parts of the scene just outside the picture boundaries is produced by percep- 
tual processing (again, perceptual processing in the early visual cortices) 
without direct sensory stimulation. In short, it is mental imagery. Whenever 
we see a picture, we have mental imagery (in the sense used here) of the scene 
just outside the boundaries of the picture. 

Most of the time, this mental imagery is unattended. But it can also be 
attended, especially in some examples of visual art, where the artists very 
explicitly try to evoke our mental imagery of the scene outside the frame. One 
famous example would be Degas, who liked to place the protagonists of his 
paintings in such a way that only parts of them are inside the frame. The rest 
we need to complete by means of mental imagery. In some extreme cases (for 
example, Dancers climbing the stairs, 1886-1890, Musée d’Orsay), we only 
see someone's arm or the top of their head and we need to complete those 
parts of their body that are outside the frame by means of mental imagery. 
Another example is Buster Keaton, who also used the viewer's mental imagery 
of the off-screen space in his films, but normally for comical effect. One 
example is the first shot of his short film Cops (1922), where we see the pro- 
tagonist in close up behind bars and looking depressed. The second shot 
reveals that he is behind an iron gate talking to a girl, the object of his unre- 
quited love (see Chapter 31 for more examples of this kind). 

To sum up, the first step of the boundary extension process is that we rep- 
resent the scene just outside the boundaries of the picture by means of mental 
imagery. In other words, looking at Figure 12A leads to a hybrid perception/ 
imagery representation of Figure 12B, where the parts closer to the edges 
(basically the difference between Figure 12B and Figure 12A) are represented 
by mental imagery. 

The second step takes us from this hybrid perception/imagery representa- 
tion of Figure 12B to the memory of Figure 12B. And just as in the case of the 
first step, we have plenty of independent empirical evidence about how this 
step works. We know (see Chapter 7) that the perceptual system is prone to 
treat perception and imagery similarly. 
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Remember that mental imagery is representation that is produced by per- 
ceptual processing in the early sensory cortices without direct sensory stimu- 
lation. Both perception and mental imagery then amount to representation 
that is produced by perceptual processing in the early sensory cortices—the 
only difference between them is about whether mental imagery is triggered 
by direct sensory input. To use amodal completion as an analogy for a second, 
if you look at the landscape through a mosquito net, strictly speaking you 
perceive little squares of the landscape and you represent parts of the land- 
scape that are between these little squares (that are occluded by the net itself) 
by means of amodal completion. This nonetheless gives rise to a unified visual 
experience. 

The memory system takes the hybrid perception/imagery representation of 
the scene and transforms it wholesale into memory (regardless of what part of 
this representation was imagery and what was perception). It transforms the 
hybrid perception/imagery representation of Figure 12B into memory of 
Figure 12B. To go back to the amodal completion analogy: when we look at 
the landscape through the mosquito net, we later recall not only the little 
squares we actually perceive, but the landscape that is the hybrid of amodally 
completed parts and perceived parts. 

I will argue that this two-step explanation of boundary extension combines 
the explanatory benefits of the perceptual adjustment and the memory adjust- 
ment account. And it does so without inheriting their problems. 

As we have seen, the most influential version of the perceptual account of 
boundary extension is the perceptual schema account (Intraub and 
Richardson 1989; Intraub 2002). One of the most important objections to this 
account comes from the experiments that show that boundary extension is 
also present for scenes with no recognizable objects. So our perceptual system 
is not in a position to apply a perceptual schema of the scene outside the 
boundaries on the basis of recognized objects inside the boundaries, for the 
simple reason that there are no recognized or even recognizable objects inside 
the boundaries (only random dots; for an example see McDunn et al. 2014; 
see also Mamus and Boduroglu 2018 for similar results involving semanti- 
cally inconsistent scenes). 

While these are clearly difficulties for the perceptual schema account, they 
pose no problem for my account as the early cortical representations of the 
scene behind the boundaries of the picture are not necessarily formed in 
response to top-down information about the specific objects inside the frame 
(which would amount to a perceptual schema). They could be based solely on 
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the geometrical features of the patterns inside the frame (see Vrins et al. 2009 
for a summary). 

Another consideration that militates against the perceptual account comes 
from subjects with bilateral hippocampal damage (which is a memory disor- 
der). These subjects show much less boundary extension than controls 
(Mullally et al. 2012; see also Jajdelska et al. 2019). This seems to suggest that 
boundary extension has to do with memory, not perception, as memory 
impairment has a significant influence on it. Note, however, that the hippo- 
campus has a well-demonstrated influence on mental imagery (again, on rep- 
resentations that are produced by perceptual processing in the early sensory 
cortices without direct sensory stimulation). In other words, my explanatory 
scheme predicts that bilateral hippocampal damage would interfere with 
boundary extension. 

Finally, proponents of the memory account often dismiss the perceptual 
schema account on the basis of the fMRI findings of Park et al. (2007), which 
could be taken to show that the early visual cortices are not involved in 
boundary extension. It is important to point out that the Park et al. findings 
do not in fact show that the early visual cortices are not involved in boundary 
extension, something the authors of the study later themselves explicitly 
acknowledge (see Park and Chun 2014, esp. pp. 63-5). Further, more recent 
fMRI experiments on how the visual cortices behave in boundary extension 
(see Chadwick et al. 2013 and Park and Chun 2014 for summaries) show that 
the early visual cortices are very much involved in boundary extension, just as 
my account predicts. 

So much for anti-perceptual considerations. But the memory account of 
boundary extension has also been argued to be inconsistent with some empir- 
ical findings. 

While the emotional content of pictures does not seem to have an effect on 
boundary extension in general (Candel et al. 2003), for highly anxious sub- 
jects, negative arousal (and only negative arousal) does have a consistent 
effect (towards less boundary extension) (Matthews and Mackintosh 2004). 
One way of explaining this is that the attention of these subjects is engaged at 
the central part of the picture (where the emotional content is) and this takes 
the attention away from the boundaries. More generally, it has been found 
that if our visual attention is engaged elsewhere, this influences boundary 
extension (Intraub et al. 2008). 

Attention is a perceptual phenomenon, so this seems to be a point in favor 
of the perceptual account and a point against the memory account. Note, 
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however, that my account can explain these findings in a straightforward 
manner. Mental imagery is as sensitive to the allocation of attention as sen- 
sory stimulation-driven perception (see Nanay 2015a for a summary; see also 
Chapter 10). So if our attention is engaged elsewhere, this has consequences 
for the details of the mental imagery of the scene just outside the boundaries 
of the picture. My account can explain the effects of attention on boundary 
extension as much as the perceptual account can. 

We have seen that my account is not vulnerable to the most widespread 
objections to the perceptual and memory accounts. On the other hand, some 
empirical findings seem to support my account. 

First, the most consistent way of canceling out boundary extension is by 
making the frame extremely salient (Gottesman and Intraub 2002, 2003). It is 
not clear why the prominence of the frame should influence the perceptual 
schema or the way this scene is encoded in memory, but we can explain this 
effect in terms of the mental imagery of the scene outside the boundaries of 
the picture in a much more straightforward manner as the salience of the 
frame works against the early cortical representations of the scene just outside 
the boundaries of the picture. 

Second, boundary extension can be induced haptically—by means of touch 
(Intraub 2004). This seems very difficult to explain in terms of memory error 
or (visual) perceptual schemas. On the other hand, given the vast amount of 
research on the visual mental imagery of (non-cortically) blind people (see 
Chapter 15) and also the research on haptically induced multimodal visual 
imagery (where sensory stimulation in the haptic sense modality triggers 
mental imagery in the visual sense modality; see James et al. 2002; see also 
Nanay 2018a for a summary), this finding is exactly what my account would 
predict, as multimodal mental imagery is triggered in a similarly automatic 
and involuntary manner as the mental imagery involved in representing the 
scene just outside the boundaries of the picture. 

Finally, there is an objection that could be raised directly about my 
explanatory scheme (see also Gottesman and Intraub 2003). Explicitly 
imagining the scene outside frame does not increase boundary extension 
(Munger and Multhaup 2016). Experimenters asked the subjects to imag- 
ine what the photographer would see if she zoomed out, or to imagine the 
smells and sounds coming from outside the frame, and this did not have an 
effect on boundary extension. One might think that this is a problem for 
my account, but it is not. As we shall see in Chapter 22, it is an open debate 
whether imagination presupposes the exercise of mental imagery (Kind 
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2001), but even if it does, it is a very specific way of using mental imagery, 
which is very different from the automatic, involuntary formation of 
mental imagery outside the boundaries of the picture. And the relation 
between mental imagery and imagination is exactly the topic I am now 
turning to. 


22 


Mental Imagery versus Imagination 


Mental imagery is not imagination. Imagining is something we do. We imagine 
things. It is a mental action and (typically) a voluntary act. Mental imagery 
is not. Mental imagery is not a mental action and, crucially, it can be invol- 
untary. When we have flashbacks to an unpleasant scene, this is mental 
imagery; it would not count as imagination in any sense of the term (see 
also Gregory 2010, 2014; Langland-Hassan 2015; Wiltsher 2016; Arcangeli 
2020 on the differences between imagination and mental imagery). It is 
involuntary mental imagery. The same goes for earworms: annoying tunes 
that go through our head in spite of the fact that we really dont want them to. 
Again, this is not auditory imagination, but it is auditory mental imagery. In 
spite of these differences, as we have seen in Chapter 2, throughout the his- 
tory of philosophy people used the term “imagination” to refer to what we 
now would describe as mental imagery. 

Thus far, I have just used the term “imagination” as if it were a unitary con- 
cept. But it is not. Imagination comes in many forms. Probably the most com- 
monly drawn distinction between different imaginative episodes is between 
sensory and propositional imagination. Sensory imagination is imagining 
seeing, hearing, smelling, etc. something. More generally, as Paul Noordhof 
says, “the distinctive feature of [sensory] imagining is that a condition of its 
success is to recreate the sensory experience of the thing imagined” (Noordhof 
2008, p. 337). Propositional imagination is imagining that such and such is 
the case. In other words, the former is imagining perceiving x, whereas the 
latter is imagining that x is F Imagining seeing the Eiffel Tower from across 
the river is sensory imagination. Imagining that the Eiffel Tower is in Rome is 
propositional imagination. 

One important question in the philosophy of imagination is about how to 
draw the line between these two forms of imagination. Another, related, ques- 
tion is which one has anything to do with mental imagery? We have seen that 
we can have mental imagery without imagination (see the flashback and 
the earworm examples). But how about the other way round? Can we have 
imagination without mental imagery? In other words, does imagination nec- 
essarily involve the exercise of mental imagery (Kind 2001; Van Leeuwen 
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2016; Langland-Hassan 2020)? The answer depends on what kind of imagina- 
tion the question is about. 

‘There is strong agreement that sensory imagination, for example imagining 
seeing the Eiffel Tower from across the river, does necessarily involve mental 
imagery (in this case, visual imagery). But there is no agreement about whether 
mental imagery is necessarily involved in propositional imagination, for example 
imagining that Paris is the capital of Italy. 

And here we need to introduce yet another category in the discussion of 
sensory and propositional imagination, namely, supposition (for example, 
supposing something for the sake of argument; see Arcangeli 2019). In the 
philosophy of imagination, distinguishing sensory and propositional imagi- 
nation often involves distinguishing both of these forms of imagination from 
supposition. So not one, but two division lines need to be drawn: one between 
sensory and propositional imagination and one between propositional imagi- 
nation and supposition. 

Mental imagery is used heavily in these debates about how to delineate 
sensory imagination, propositional imagination, and supposition, but, 
depending on where one stands on whether propositional imagination neces- 
sarily involves mental imagery, it is used very differently. Those who believe 
that mental imagery is necessary for propositional imagination can, and often 
do, draw the line between imagination (both sensory and propositional) and 
supposition in terms of mental imagery: imagination involves mental 
imagery, whereas supposition does not (see Kind 2001 for a modern locus 
classicus). Those, in contrast, who deny that mental imagery is necessary for 
propositional imagination can, and often do, draw the line between sensory 
imagination on the one hand and propositional imagination as well as suppo- 
sition on the other in terms of mental imagery: mental imagery is necessary 
for sensory imagination, but not for propositional imagination and supposi- 
tion. Either way, the relation between mental imagery and imagination is of 
crucial importance in how the terrain of imaginative states is broken down. 

It needs to be pointed out that many of the arguments on either side appeal 
to introspection (Chalmers 2002; Byrne 2007). If we allow for unconscious 
mental imagery, as I argued in Chapter 4 that we should, then these argu- 
ments would not lead to any kind of conclusive resolution. It may be that no 
conscious images flash in the philosopher’s mind, but it does not follow from 
this that no mental imagery is involved in this imaginative episode. 

And as we have seen in Chapter 19, mental imagery is involved in language 
processing, so, assuming that propositional imagination relies on language 
processing, it is also involved in propositional imagination. To go back to my 
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example, imagining seeing the Eiffel Tower from across the river does involve 
visual mental imagery. But so do episodes of propositional imagination, like 
imagining that Paris is the capital of Italy. The mental imagery that is involved 
in imagining that Paris is the capital of Italy (say, the gustatory imagery of 
good coffee in a Parisian café), may not fix the content of this imaginative 
episode (something very explicit in Kind 2001). But it is triggered automati- 
cally each time we have a mental episode with language-like components. 
Further, an imaginative episode in one sense modality (say, audition) also 
automatically triggers early cortical representations in other sense modalities 
(say, vision) (see Vetter et al. 2014; see also Bergen et al. 2007 for more empir- 
ical support and Stokes 2019 for a philosophical summary). Supposition also 
involves mental imagery for the very same reasons. 

So, according to the classic picture, there is a division line between sensory 
imagination and propositional imagination and there is also a division line 
between propositional imagination and supposition. I argued elsewhere that 
the reason why it has proven to be so challenging to draw both these division 
lines: the logical space between sensory imagination and supposition is not 
very significant (see Nanay forthcoming c). So there is not much logical space 
remaining for propositional imagination. Here I want to focus on the ways in 
which propositional imagination relies on mental imagery. 

Let us start with a widely acknowledged and salient difference between 
imagination and supposition, which is highlighted in the imaginative resist- 
ance literature (Walton 1994; Gendler 2000, 2006; Weatherson 2004; Nanay 
2010c; Camp 2017). We can suppose any proposition, whatsoever. We can 
suppose, for the sake of the argument, logically or metaphysically impossible 
or ethically dubious propositions, for example. But, if the phenomenon of 
imaginative resistance is a real phenomenon, we can't imagine all proposi- 
tions.’ We cant imagine morally dubious propositions, for example, to use 
the classic imaginative resistance case, that “In killing her baby, Giselda did 
the right thing; after all, it was a girl” (Walton 1994, p. 37). There are many 
explanations of imaginative resistance, but one thing they all have in common 
is that the set of propositions that we can imagine is narrower than the set of 
all propositions. Hence, the set of propositions that we can imagine is nar- 
rower than the set of propositions we can suppose. 


1 According to some views about imaginative resistance, what stops us from imagining these con- 
tents is not our inability, but our unwillingness (Gendler 2000). While I will formulate imaginative 
resistance as an inability, my argument could be reformulated to fit the imaginative resistance as 
unwillingness views. 
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Now let’s turn to another salient difference between imagination and sup- 
position, which involves the temporal unfolding of these mental processes. If 
you suppose, for example, when solving a mathematics problem, that a=b, 
this does not have a very significant temporal profile. As soon as the content 
of the supposed proposition (that a=b) is grasped, the supposition happened. 
This is not so when it comes to imagination. When you imagine that Paris is 
the capital of Italy, this mental episode is not completed when the proposition 
(that Paris is the capital of Italy) is grasped. If it were so, then all propositions 
could be imagined and we have seen in the previous paragraph, this is not so. 
When imagining that Paris is the capital of Italy, grasping the proposition is 
only the first step. Having grasped the proposition, we then elaborate the 
imagined proposition (and, arguably, it is exactly this elaboration that fails in 
those cases where imaginative resistance kicks in). Peter Langland-Hassan 
captures this feature of propositional imagination when he insists that it is a 
“rich and elaborated [...] thought about the possible” (Langland-Hassan 
2020, p. 7). Propositional imagination is “rich and elaborated,’ whereas sup- 
position is neither rich nor elaborated (or maybe just much less rich and 
much less elaborated). 

The big question is, then, where this elaboration comes from. And given 
the involvement of mental imagery in propositional imagination, the obvious 
answer would be that this elaboration happens with the help of mental 
imagery (see also the research on how, in voluntary imaginative episodes, 
mental imagery gets more vivid over time, D’Angiulli and Reeves 2003/2004). 
This would explain why imagination (but not supposition) has an undeniable 
affective dimension (Moran 1994) and also why imagination (but not suppo- 
sition) is a skill you can be better or worse at (Kind 2020). Again, this mental 
imagery does not need to be conscious and even when it is conscious, it does 
not need to be particularly determinate. It does not need to be able to fix the 
content of the imaginative episode either. 

But at this point one may wonder whether one could find other forms of 
elaboration that could be used in the case of propositional imagination 
(which would be missing in the case of supposition). Maybe the elaboration 
in question is not imagistic, but propositional. When imagining that Paris is 
the capital of Italy, this proposition is elaborated with the help of further 
propositions (like the proposition that Paris has good coffee). The problem 
with this proposal is that suppositions are also elaborated with the help of 
further propositions, in fact, this is exactly how reductio ad absurdum argu- 
ments proceed: we suppose the proposition for reductio and then elaborate it 
with the help of further propositions up to the point where we hit a 
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contradiction. So, if propositional imagination were only elaborated with the 
help of further propositions, this would not explain how propositional imagi- 
nation differs from supposition (and why we can't imagine propositionally 
everything we can suppose). 

In short, mental imagery is substantially involved both in sensory imagina- 
tion and in propositional imagination. Sensory imagination is the voluntary 
use of conscious mental imagery. And propositional imagination is the sup- 
position of a proposition that is elaborated with the help of mental imagery. 
Mental imagery is not, in contrast, substantially involved in supposition per 
se (as here mental imagery could be thought of as merely accompanying sup- 
position without being constitutive of it). 

I talked about a major distinction within the category of imagination, 
namely the one between sensory and propositional imagination. But there are 
imaginative episodes that play a very important role in our life that do not fall 
clearly into one of these two categories (see Nanay 2009a, 2021f; Kind 2013 
for taxonomies of different kinds of imaginative episodes). Imagining being 
in someone else’s situation is one of these (Williams 1973; Wollheim 1973; 
Velleman 1996). As it has been emphasized, imagining being in someone 
else’s situation is a form of self-imagining, which is often contrasted with 
propositional imagination. But does this mean that it is an instance of sensory 
imagination? Crucially, what role does mental imagery play in imagining 
being in someone else’s situation? This is an especially important question, 
given the importance of imagining being in someone else’s situation in 
decision-making. 

Our grand decisions (decisions we struggle with, that we can’t make easily 
and quickly) rely on imagination (see Nanay 2016c).” Think back to some of 
the big decisions you have made over the years. Break up with your partner 
or not? Which college to choose? Go to grad school or not? Which job offer 
to take? Which house to bid on? And so on. There are both empirical and 
conceptual reasons to think that you made all of these decisions by imagining 
yourself in one of the two situations and then imagining yourself in the other 
and then comparing the two. Even if you took out the yellow legal pad and 
drew up the pros and cons, your decision was not based on the direct com- 
parison of the number of pros and the number of cons about how your 
desires would be satisfied, in the two respective scenarios, given your back- 
ground beliefs. 


> These decisions do not need to be, but can be, what these days are called “transformative 
decisions.” 
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Here is an example from my own past. After college, I was accepted in grad 
programs in the US and the UK. I thought, probably correctly, that this choice 
would have a major impact on my life course and was really struggling with 
this. I could narrow down the US options to what seemed the best (within the 
US) and I did the same for the UK options. But deciding between becoming 
American and becoming British was just too difficult. I imagined myself in 
Britain, at fancy college dinners, wearing a gown and sipping port. And I 
imagined myself in Californian diners in flip flops with Oreo shake in hand. 
Not an easy comparison. 

The point is that I really had very little idea about just what situations I 
would find myself in. So I actually imagined myself in imagined situations— 
ones that were more informed by films I had seen than reality. But the imagi- 
native episodes that play a role in decision-making are even more complicated. 
Imagination is used not even twice, but three times. Let’s suppose I am mak- 
ing this decision now. Who am I imagining in that Californian diner? My 
future self is very different from my current self, so imagining my current self 
would not be particularly helpful. It is my future self who has the chance to 
hang out in California, but the problem is that we don’t have any firm infor- 
mation about what our future selves will be like. So it is really my imagined 
future self who should appear in these imaginative episodes. In short, when 
we make these grand decisions, we imagine what we imagine to be our future 
selves in imagined alternative scenarios. Imagination is used three times. 

There are strong empirical reasons for thinking that this is how we actually 
do make decisions (see Nanay 2016c for a summary). This is a descriptive 
claim. I did not say anything about whether making decisions this way would 
lead to an optimal decision. On the face of it, not so much. The scenarios we 
imagine ourselves in have very little to do with the actual situations we would 
find ourselves in. I spent relatively little time in Cambridge sporting a gown, 
and really almost no time in Californian diners in flip flops. 

Imagining our future selves is especially unreliable, as we systematically 
underestimate how much we will change in the future, as the psychological 
phenomenon of the End of History Illusion shows (Quoidbach et al. 2013). 
We all think that who we are now is the finished product: we will be the same 
in five, ten, twenty years. But this is not so. Our preferences and values will be 
very different in the not-so-distant future. This is the End of History Illusion. 

There is a final twist here. Remember that decision-making involves imag- 
ining your future self in a hypothetical situation. But your future self will 
largely be formed in response to the decision that you're about to make. 
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Hence, it is not possible to reliably imagine your future self that will be in 
California or Cambridge. Your future self will be very different from your 
present self. So you cant reliably imagine it. And how it will turn out to be 
will depend on your imaginative episode that you use to make your decision. 
I ended up spending a fair bit of time both in California and in Cambridge in 
my life and I can confidently say that a fully Californified version of me would 
be very different from a fully Cambridgified version of me (see Nanay forth- 
coming g on the psychological consequences of this). 

Now we can address the question about whether and how mental imagery 
shows up in the imaginative episodes like imagining someone from the inside. 
And it seems that it is heavily involved in at least one species of imagining 
someone from the inside, namely, the one that is used in decision-making. 
A number of empirical studies show that imagery is a crucial part of the 
imagining that is involved in decision-making. The vividness of imaginative 
episodes has a major influence on decision-making. If you are deciding 
between two positive scenarios, the one that is imagined more vividly tends 
to win out. And if you are deciding between two negative scenarios, the one 
that is imagined less vividly tends to win out (Austin and Vancouver 1996; 
Trope and Liberman 2003; see the rich literature on construal-level theory 
and also on the effects of the vividness of imagination on future discounting 
in Parthasarathi et al. 2017 and Mok et al. 2020; see also the discussion in 
Chapter 25). Not only is imagery a crucial ingredient of this imaginative epi- 
sode (just like it is a crucial ingredient of all imaginative episodes that this 
chapter considered), it also helps us to explain some important features of the 
decision-making process.* 


* See also Wiltsher and Nanay (2021) on the role imagination plays in self-knowledge. 
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Emotion 


Try to imagine, as vividly as you can, being attacked by a rabid dog, foaming 
at the mouth, snapping at your feet right there under your desk. This is an 
important form of mental imagery and highlights that imagery can dramati- 
cally affect emotions. On the other hand, the impact of emotions on imagery 
is equally significant—the imagery that occupies our minds is very often 
under the control of our dominant emotion, which sometimes alters its fabric 
and our capacity to control it. In other words, there is a two-way interaction 
between emotions and mental imagery (see Holmes and Mathews 2010 and 
Hoppe et al. 2021 for summaries). 

First, the imagery —> emotion direction is not particularly surprising—after 
all, you yourself presumably experienced this influence if you really did 
imagine the scary dog in the example at the start of the chapter. But there 
have been plenty of experiments that show that imagery doesn’t just influence 
our emotions, it influences our emotions much more strongly, more quickly 
and with more long-lasting effects than linguistic representations (of the same 
content; see Holmes and Mathews 2005; Hoppe et al. 2021).” 

Second, the emotion —> imagery direction may be (even) more significant: 
Neuroimaging data shows that the subject’s emotional state influences pro- 
cessing in the primary visual cortex (Vetter et al. 2016) and this effect is espe- 
cially strong if the visual input is ambiguous (Gerdes et al. 2014). This effect 
also works crossmodally: an emotionally charged stimulus in the auditory 
sense modality influences the processing in the visual cortex and vice versa. 

Further, it has been known for a long time that emotionally charged stimuli 
(for example, threat cues) improve accuracy and speed of perceptual process- 
ing. But some recent findings show that this effect depends on the subjects 
capacity to form vivid mental imagery (Imbriano et al. 2020). Emotionally 
charged input leads to more vivid imagery, which, in turn leads to faster and 
more accurate responses. 


1 A more surprising example of imagery leading to emotional responses is that amodal completion 
(of neutral contours) triggers positive valence (Erle et al. 2017). 
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My final example of how emotion influences mental imagery is the mood 
congruency effect (Blaney 1986; Matt et al. 1992; Gaddy and Ingram 2014). 
The most famous example of the mood congruency effect is mood congruent 
memory (Loeffler et al. 2013)—we are more likely to recall scary memories 
when we are scared, for example. But mood congruency also works in the 
case of mental imagery: your general mood makes it more likely that you 
form mental imagery that is congruent with your mood. And it makes it less 
likely that you form mental imagery that is not congruent with your mood. 
We also encode emotionally salient stimuli in a more detailed manner, which 
makes it possible to form more vivid mental imagery (Hamann 2001; Phelps 
2004; LaBar and Cabeza 2006; Yonelinas and Ritchey 2015). 

To sum up, emotions lead to imagery and imagery leads to emotions. There 
are two ways in which this can happen. The first option is that we have 
emotion-free imagery, which influences our emotions, and that, in turn, 
influences our emotion-free imagery. The second option is that our imagery 
itself is emotionally charged (by which I mean that the content of mental 
imagery cant be fully specified without reference to emotions).” So it is not 
the emotion-free imagery state that influences our emotions, it is the already 
emotionally charged imagery that does so. And imagery is not emotion-free, 
which is merely influenced by our emotions; rather, imagery itself is emotion- 
ally charged. This second option would not only be a more parsimonious way 
of accounting for this bidirectional interaction, but it is also more in sync 
with empirical findings about the relation between imagery and emotions. 

This doesnt imply that all imagery is emotionally charged. If you visualize 
your long-deceased grandmother, this imagery can be emotionally charged, for 
example. But when you close your eyes and visualize an apple, your imagery may 
not be emotionally charged (unless you really like—or hate—apples). The claim 
is that some but not all mental imagery is emotionally charged. 

This claim also follows from some recent accounts of perception, according 
to which perception itself is emotionally charged as it represents so-called 
“micro-valences” (Lebrecht et al. 2012). If perception (sensory stimulation- 
driven perception) is emotionally charged, then it is difficult to see why men- 
tal imagery (not sensory stimulation-driven perception) would not be. 

I said that imagery itself is emotionally charged. This can mean many 
things (depending on one’s account of emotions and on one’s account of 
mental imagery). It can mean that the imagery attributes emotionally charged 


> Depending on their views about the relation between affect and emotion, some readers may want 
to substitute the term “affectively charged” for “emotionally charged,’ here and throughout the chapter. 
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properties: not just properties like shape and color, but properties like valence, 
for example. The imagery, according to this account, represents not just the 
shape and color properties of the vicious dog, but also its negative valence. 

It is important to emphasize that this is just one way of cashing out the 
claim that imagery is emotionally charged. According to other accounts of 
emotions, the emotional charge may be due to some non-representational 
features of mental imagery. And those who may worry that an account of the 
representation of valences in imagery would force us to take sides in the 
grand debate about what kind of properties can be perceptually represented 
(given that valence properties would need to be represented by perceptual 
representations that constitute mental imagery), would not need to reject the 
claim that imagery itself is emotionally charged. 

Crucially, taking mental imagery itself to be emotionally charged has sig- 
nificant explanatory benefits when it comes to explaining a number of empir- 
ical findings. First, it has been known for a while that perceptual learning 
does not require sensory input: mental imagery alone can lead to perceptual 
learning as well (see Tartaglia et al. 2009, 2012). But recently it was found that 
this effect is stronger if the imagined stimulus is emotionally charged (Lewis 
et al. 2013). 

Unless we take mental imagery itself to be emotionally charged, this would 
be difficult to explain: imagery would need to trigger our emotion, which 
then in turn would need to influence the early cortical processing. This would 
amount to a lengthy processing, where first the early cortical representation is 
induced by means of imagery, which, in turn, activates a higher-level emo- 
tional state (and this takes a fair amount of time), which then, in turn, acti- 
vates the early cortical regions again in a top-down manner, to induce 
perceptual learning. It is difficult to see how this long and cumbersome way 
of inducing perceptual learning would be more efficient than the entirely 
early cortical matter of imagery-induced perceptual learning, where the early 
cortical representation that is mental imagery induces perceptual learning 
directly. 

Taking imagery to be emotionally charged also helps us to explain the fol- 
lowing empirical finding: visualizing an emotionally charged event or person 
at an emotionally neutral place confers emotional charge to the place (see 
Benoit et al. 2019). It has been known for a while that seeing a negatively 
valenced event (say, a fight between two friends of yours) at a neutral place 
(say, the corridor in front of your office) makes this formerly neutral place 
inherit the negative valence of the event. So, in the future, when you see 
the corridor of your office, it triggers slightly (or not so slightly) negative 
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emotions. The crucial finding is that the same process also takes place even if 
you merely visualize a negatively valenced event at a neutral place. In short, 
negatively valenced mental imagery confers valence on various components 
of the imagined scene, which then remain emotionally valenced. 

If we take imagery itself to be emotionally charged, this is easy to explain: 
forming imagery of the negatively valenced event at the neutral location 
would confer the negative valence to the location because the representation 
of this location itself is a valenced affair. If, on the other hand, we assume that 
imagery is emotion-free, the explanation is much less clear. What happens in 
this scenario is that we form emotion-free imagery of, say, the fight in the 
corridor. This is not itself a valenced representation. And this emotion-free 
representation gives rise to the negative emotion. 

There are two options here for the opponents of emotionally charged men- 
tal imagery. First, this valenced representation may not represent the corridor 
and the fight at all. In this case, this negative emotion would need to somehow 
attach itself to the non-valenced representation of the corridor and the fight, 
but it is unclear how or why this would happen as the corridor and the fight 
are, by supposition, represented in an emotion-free manner. 

The second option is that the valenced representation that the emotion-free 
mental imagery gives rise to does represent the corridor and the fight. In this 
case, we should ask how it does so. If it does so by means of mental imagery, 
then the resulting state is an instance of emotionally charged mental imagery 
(which is the very claim I wanted to argue for). If, in contrast, it does so by 
means of a different kind of representation, say, a belief, then we get the fol- 
lowing picture: the emotion-free mental imagery of the fight in the corridor 
gives rise to a valenced belief (or some other valenced non-imagery represen- 
tation) about the possibility of there being a fight in the corridor. Besides the 
ad hoc postulation of a number of mental states and processes, which is justi- 
fied only by the attempt to reject the view that imagery is emotionally charged, 
it should also be noted that this picture does not in fact explain why such a 
belief about the possibility of a fight would confer emotional valence to the 
corridor. In short, taking imagery to be emotionally charged can explain why 
this emotional infection happens in imagery in a straightforward manner, 
whereas the alternative can't. 

Finally, taking imagery itself to be emotionally charged also has significant 
philosophical consequences. One of the most important questions in the phi- 
losophy of emotion is about what kind of mental states emotions are. Are they 
perceptual states (Prinz 2004; Déring 2007), quasi-perceptual states (Roberts 
2003), beliefs (Solomon 1977; Nussbaum 2001), belief-like states (Greenspan 
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1988; Helm 2001; Brady 2009;), or maybe states of “action readiness” (Frijda 
2007; Scarantino 2014)? One thing that all the parties in these debates agree 
on is that emotional states are representational states: they have content. 
When I am afraid of a dog, my emotion is directed at something out there in 
the world: the dog. They also agree that emotions can be triggered in a variety 
of ways. 

Take the following examples: 


(i) I see a scary dog and this perceptual state gives rise to an emotional 
state (of being afraid of it). 
(ii) I remember a particularly threatening encounter with a dog and this 
episode of remembering gives rise to an emotional state. 
(iii) I imagine a particularly threatening encounter with a dog and this 
episode of imagining gives rise to an emotional state. 
(iv) I think about a dog and this gives rise to an emotional state. 


In these examples, emotional states are triggered by four different kinds of 
mental states: perceptions, rememberings, imaginings, and thoughts. In these 
four examples, we are in a non-emotional state (perceiving, remembering, 
imagining, thinking) and this gives rise to an emotional state. One important 
desideratum for any theory of emotion is to explain how emotions can come 
about in all these routes (by perception, memory, imagination, thought, etc.). 

The big question is, again, about the nature of this emotional state: if it rep- 
resents a dog, does it do so perceptually or rather in a belief-like manner (or 
some other way altogether)? And depending on which ways the emotional 
states are brought about, different answers would seem appealing. If we are 
focusing on emotions we have towards things we perceive, the perceptual the- 
ories of emotion will be appealing, as then there is a smooth transition from 
the non-emotional perceptual state (like the one in (i) above) to the emotional 
state it gives rise to, which then could be thought of as a perceptual state. If, in 
contrast, we are focusing on emotions we arrive at via thinking, this may not 
be such an appealing option and may push one in the direction of judgment 
theories of emotion. 

If we take mental imagery to be emotionally charged, then we can question 
a rarely questioned assumption that has routinely been taken for granted in 
this debate. This premise is that the non-emotional state and the emotional 
state in all these examples are in fact different mental states. It is widely agreed 
upon that emotional states are different from the non-emotional mental states 
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that give rise to them—they have different content and also normally different 
phenomenology (see esp. Deonna and Teroni 2012, 2015). 

Seeing or remembering the dog does not have emotional valence, whereas 
the emotional state that these non-emotional states give rise to does have 
emotional valence. Further, given that the same emotional state (in our exam- 
ple, fear) can be brought about by very different non-emotional states (for 
example, perceiving, remembering, imagining, thinking), treating the non- 
emotional state and the emotional state as separate seems like a very good 
idea: in this way, the emotional state is not constrained by what kind of non- 
emotional state it has brought about. 

How, then, can we decide between the two-state picture (a non-emotional 
state gives rise to an emotional state) and the one-state picture (there is only 
one state, which represents in an emotionally charged manner)? 

There may be strictly armchair reasons to question the assumption that the 
non-emotional and the emotional states are separate, maybe for reasons of 
parsimony (bracketing, for a moment, the issue that parsimony consider- 
ations tend to be notoriously problematic, see Sober 2015). When we talk 
about remembering a threatening encounter with a dog and being in an emo- 
tional state, we have no reason to talk about two mental states, rather than 
one: it is not the case that we remember this dog in an emotion-free manner 
and then this emotion-free state gives rise to an emotional state. My remem- 
bering is itself emotionally charged and what constitutes my emotion is this 
emotionally charged memory. Similarly, when I think about a scary dog, it’s 
not that I first have an emotion-free thought, which then gives rise to the 
emotion: my thought about the dog is already emotionally charged. 

But then again, simplicity considerations are rarely decisive and there may 
also be armchair reasons in favor of the two-state picture. Sometimes our 
emotions are not appropriate. If I see (or imagine) a cute fluffy cat and this 
triggers the emotion of fear, there seems to be a mismatch between the per- 
ceptual (or imaginary) representation of the cute fluffy cat and the emotion it 
triggers. This can be neatly explained if we endorse the two-state view: the 
perceptual state gives rise to the emotion, but something went wrong in this 
transition and the emotion fails to fit the perceptual state it is triggered by. In 
order to explain why and how emotions can be fitting or non-fitting, we need 
to posit not one, but two states (Deonna and Teroni 2015). Or so the armchair 
argument goes. 

Moving away from armchair considerations, I want to focus on empirical 
reasons why we should opt for the one-state picture and not the two-state 
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picture. Here, the considerations about the connection between emotion and 
imagery become relevant. First of all, note that all the alleged non-emotional 
states that give rise to emotions (seeing the dog, remembering the dog, imag- 
ining the dog, thinking about the dog) are mental states that involve either 
perception or mental imagery. Seeing a dog is a perceptual state and both 
remembering and imagining a dog necessarily involves mental imagery (see 
Chapters 20 and 22, respectively). Finally, thinking about the dog also involves 
mental imagery, given what we learned about the role of imagery in language 
processing in Chapter 19. So the alleged non-emotional states necessarily 
involve mental imagery. 

If we also assume, as I argued that we should, given the empirical evidence, 
that imagery can be, and often is, emotionally charged, then this means that 
the alleged non-emotional states (seeing, remembering, imagining, thinking) 
can be, and often are, in fact emotionally charged. There is no need to postu- 
late an extra emotional state that is separate from these representations. 

Instead of having to add an additional kind of mental state, namely, emo- 
tional state, to the existing inventory of mental states (of having a belief, per- 
ceiving, remembering, or imagining), we only need to allow for each of these 
kinds of mental states to come in emotion-free and emotionally charged vari- 
eties. Rather than having to posit emotional states as separate and distinctive 
mental states (and then having to worry about their format and content), we 
can use the mental ingredients already present and think of emotions as mod- 
ified versions of being in these mental states.* 

How about the cute fluffy cat example that was supposed to motivate the 
two-state view? Do we need one emotion-free representation of the cute fluffy 
cat and another emotional representation that represents it as scary? Not at 
all. Just as many representations represent some features of the represented 
object correctly and some other features of it incorrectly, the same can be true 
of the emotional representation that the one-state view posits. When seeing 
the Miiller-Lyer illusion, the length of the lines is represented incorrectly, but 
the color of these lines is represented correctly. Is this a reason for positing 
two different mental states, one correct, the other incorrect? I don’t think so. 
There is only one perceptual state that represents both the size and the color 
of the lines, the latter correctly, the former incorrectly. Similarly, when I see 
the cute fluffy cat and I am afraid of it, I have one emotional state, which 


ĉ This account would be consistent with the claim that an emotional state is really a set of dynami- 
cally intertwined mental states, some of which would be the “cognitive base” (see, for example, Scherer 
2009; see also Barlassina and Newen 2014). 
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represents some properties of the cat correctly, and some others incorrectly. 
There may be a mismatch between the way these properties are represented, 
but this mismatch can be fully accommodated within the one-state picture of 
emotions. 

The one-state picture leaves many features of this one mental state unspec- 
ified. The general idea is that the very same mental state that represents the 
non-emotional features of an event also has emotional valence. These two 
aspects of this one mental state can be combined in a variety of ways. One 
possibility would be that this mental state represents both emotional and 
non-emotional features. So the content of this mental state has emotional as 
well as non-emotional components. The same mental state would represent 
the attacking dog as gray, big, and scary. Gray and big are non-emotional 
properties and scary is an emotional property. But this is not the only way in 
which the emotional and the non-emotional can be combined in one men- 
tal state. 

Another option would be a form of adverbialism about emotions: being in 
an emotional state is just a way of having a belief, perceiving, remembering or 
imagining: having a belief, perceiving, remembering or imagining in an emo- 
tionally charged manner (see Döring 2014; Nanay forthcoming e for a very 
different take on adverbialism about emotions). It’s believing something emo- 
tionally, perceiving something emotionally, remembering something emo- 
tionally, and so on. So when I see a scary dog and I am afraid of it, I am ina 
perceptual state. This perceptual state represents the dog in an emotional 
manner. As Shakespeare said in King Lear (Act IV, Scene 6): “I see it feelingly”’ 

For the purposes of this chapter, I want to remain neutral between these 
ways of flashing out the one-state account of emotions. The bottom line is that 
if we take the close link between emotion and mental imagery, we have good 
reason to opt for the one-state account of emotions. 


24 
Knowledge 


Some philosophers are interested in perception primarily because perception 
can lead to knowledge. I’m not one of these philosophers. But it is important 
to see that the picture of perception and mental imagery I outlined in the first 
half of the book has important implications for the potential epistemic role 
that perception and mental imagery may play.’ More precisely, if perception 
is a hybrid of sensory stimulation-driven perception and mental imagery, 
then it is, one may worry, not a very reliable way of learning about the world. 

First, on the face of it, mental imagery itself (regardless of what it does in 
conjunction with sensory stimulation-driven perception) is not in a very 
good epistemic shape. It lacks, by definition, a causal link to the world and 
even in those cases where the indirect causal link is the most reliable, as in 
amodal completion, it fails to satisfy the safety and sensitivity conditions 
of knowledge (Helton and Nanay 2019).” But does this mean that mental 
imagery itself can’t lead to knowledge or even new information? 

This has been an influential line of thought in the history of philosophy. 
Jean-Paul Sartre, for example, famously claimed that “nothing can be learned 
from an image that is not already known” (Sartre 1940/1948, p. 12). Since, on 
his view “it is impossible to find in the image anything more than what 
was put into it,” we can conclude that “the image teaches nothing” (Sartre 
1940/1948, pp. 146-7). 

Sartre was not always making a clear distinction between imagination and 
mental imagery, so it is not clear whether it is imagination or mental imagery 
that teaches nothing. Contemporary philosophers tend to raise this issue 
about imagination (Langland-Hassan 2016, 2020; see also Kind and Kung 
2016), but the question from our point of view is whether it is true of mental 


1 See also Munro forthcoming on the role of mental imagery in the epistemology of testimony. 

> Roughly, the safety requirement is the following: Some belief gained by some method is safe just 
in case in all, or nearly all, near worlds where you form that belief on the basis of that method, that 
belief is true. And the sensitivity requirement is that some belief that p, formed by some method M, is 
sensitive just in case in the nearest worlds in which p is false and in which that subject uses M, M does 
not lead that subject to believe p (see Helton and Nanay 2019). 
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imagery. And here it seems that even if imagination teaches nothing, mental 
imagery at least sometimes can and does. 

Take the following example. You want to wrap a chocolate box in gift wrap. 
You estimate how big the piece of paper needs to be in order to cover the 
whole box and cut a piece of that size from the roll. When you try to use it to 
wrap the box, you may discover that it is not big enough. Or you may discover 
that the piece you've torn off is too long, so you will waste some of it. Or you 
may discover that it’s just right. 

This task requires mental imagery — visual imagery of the size of paper cov- 
ering the chocolate box. Your judgment about the size of the paper needed to 
wrap the chocolate box is based on your mental imagery. Importantly, it does 
not require the voluntary use of imagination. I might count to three and then 
set out to voluntarily imagine how the piece of paper I am tearing off would 
cover the chocolate box, but this is not necessary. More often, you look at the 
box, look at the wrapping paper and the visual imagery is triggered without 
you voluntarily imagining anything. 

Here, this use of mental imagery gives you new information that you didn't 
have before. When you look at the chocolate box and form (often involun- 
tarily) visual imagery of the wrapping paper needed, you may find your esti- 
mation of the size of the paper unexpected or surprising. Maybe it’s larger 
than you had assumed. Or smaller. Your estimation of the size of the paper 
needed can be very different before and after forming the mental imagery of 
the paper covering the chocolate box (and this can, of course, still be different 
from the size of the paper actually needed, see Gauker 2020; see also Levin 
2006; Gregory 2020). 

In this example, you formed mental imagery in the basis of visual cues. But 
you could do the same thing if I ask you what size of wrapping paper you 
would need to cover a 20x20x3 centimeter chocolate box. In this case, you 
form the mental imagery on the basis of verbal information. And, as before, 
the mental imagery you form can give you an unexpected and surpris- 
ing answer. 

In short, mental imagery can lead to new information and even knowledge. 

But even if mental imagery can lead to new information and even knowl- 
edge, this does not mean that it always, or even often, does so. And given the 
significant role mental imagery plays in everyday perception—our primary 
source of knowledge—we need to examine the epistemic credentials of the 
forms of mental imagery that are involved in perception. 

Perception sometimes justifies our beliefs. If I see that it is raining outside, 
this may justify my belief that it is raining outside. And much of what we 


182 MENTAL IMAGERY: PHILOSOPHY, PSYCHOLOGY, NEUROSCIENCE 


know is based on perception. This is true of sensory stimulation-driven per- 
ception. But how about mental imagery? Can mental imagery justify our 
beliefs? If not, then we have a problem. I argued in Chapter 9 that perception 
per se is a hybrid between sensory stimulation-driven perception and mental 
imagery. If sensory stimulation-driven perception can justify our beliefs, but 
mental imagery can't, then there are reasons to worry about the epistemic 
status of the hybrid of the two (Macpherson 2012; Nanay 2020b). 

There are two potential epistemic worries about how the importance of 
mental imagery in perception complicates perceptual justification. The first 
one is about the top-down influences on imagery. The second one is about the 
lack of a direct causal link to the outside world that is built into the definition 
of mental imagery. I am not sure that the former (often referred to as the cog- 
nitive penetration worry) is as serious a problem as it is often made out to be, 
but in order to keep it apart from the more serious latter worry, I will discuss 
it briefly, if only to set it apart. 

The cognitive penetrability worry is this: If perception is cognitively pen- 
etrated, we get a vicious circularity: our beliefs, thoughts, and expectations 
are supposed to be based on and justified by our perceptual states, but these 
perceptual states themselves are influenced by our beliefs, thoughts, and 
expectations (because of cognitive penetration). As Roberto Bolaño says in 
the novel 2666, “People see what they want to see and what people want to 
see never has anything to do with the truth.”’ 

The challenge from cognitive penetration was originally focusing on one 
specific account of perceptual justification, namely, dogmatism (Siegel 2011; 
see also Lyons 2011): the view that “whenever you have an experience as of p, 
you thereby have immediate prima facie justification for believing p” (Pryor 
2000, p. 536). The argument was that if perception is cognitively penetrated, 
dogmatism is not an option because the perceptual states that our beliefs are 
supposed to be justified by are themselves influenced by our existing beliefs 
and expectations. This argument has been generalized to apply to other theo- 
ries of justification (not just dogmatism, see Siegel 2011; Tucker 2014; see also 
Lyons 2015; Ghijsen 2016; Silins 2016). 

But is perception cognitively penetrated in the sense relevant for episte- 
mologists? The first thing to note is that for many epistemologists, the 


* Roberto Bolaño: 2666 (London: Picador, 2009), p. 219. For full effect, the quote continues as fol- 
lows: “People are cowards to the last breath, Pm telling you between you and me: the human being, 
broadly speaking, is the closest thing there is to a rat.” 
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relevant sense of “perception” in the cognitive penetrability of perception 
debate is very clearly the sense of perceptual phenomenology, as many have 
emphasized the role of conscious perception in perceptual justification—a 
premise it would be tempting to argue against (see Berger et al. 2018), but one 
I will accept in the present context for the sake of the argument. In Chapter 11, 
I made a distinction between the claim that perceptual experiences are influ- 
enced by beliefs and expectations and the claim that early perceptual process- 
ing is influenced in a top-down manner by perceptual (or non-perceptual) 
processing further up in the visual hierarchy. The former claim would be rele- 
vant for epistemology, but it is notoriously difficult to argue for or against. 
And while we have plenty of evidence for the latter claim, its relevance for 
classic epistemological questions is not entirely clear. 

As a result, it does not seem that top-down influences on perception (or 
cognitive penetration) should give significant cause for concern to anyone 
who is interested in perceptual justification. Empirical findings about top- 
down influences on early perceptual processing do not seem to jeopardize 
any philosophical account of perceptual justification. Epistemologists are 
worried about whether our perceptual experiences are influenced by our 
beliefs or by other cognitive states. They tend not to be too interested in 
whether the primary visual cortex is influenced by the V4/V8 or the MT (but 
see Helton and Nanay 2022 on the epistemological significance of such top- 
down infliences). 

Whether or not the worry about top-down influences on perception is a 
genuine epistemic worry, I want to argue that the real epistemic problem of 
the importance of mental imagery in perception lies elsewhere. 

Mental imagery may or may not be influenced by top-down information. 
And even when it is, it is not clear how far up this top-down information 
comes from. But even in those cases where mental imagery is not at all influ- 
enced by top-down information, it fails to be directly caused by what it rep- 
resents. Even in the somewhat trivial case of the blind spot, where, supposedly, 
no top-down information is being used, the blind spot is filled in by mental 
imagery—by perceptual processes not directly triggered by the sensory input. 
So, no matter what way the blind spot is filled in, that has no direct causal 
connection with whatever is in front of that part of the retina. It follows from 
the definition of mental imagery that it fails to be directly caused by what it 
is about. 

Sensory stimulation-driven perception is perceptual representation directly 
caused by the sensory input, but mental imagery is defined precisely by the 
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lack of such a direct causal link to the sensory input. In any kind of broadly 
externalist account of justification,* this raises worries about the epistemic 
work that mental imagery can do (as the reliability of mental imagery is sup- 
posed to depend on the directness of the causal link between mental imagery 
and what the mental imagery is about). 

This does not mean that mental imagery does no epistemic work, as the 
lack of a direct causal link would be compatible with the mental imagery 
nonetheless carrying information about the external world reliably— 
especially if we endorse a concept of reliability that is defined not in terms of a 
direct causal link, but rather in terms of “having a good track record” (see 
Goldman 1999 for analysis; see also the discussion of this in the case of 
amodal completion in Helton and Nanay 2019). But if we take the importance 
of mental imagery in perception seriously, we need to examine the reliability 
of these non-direct causal links of mental imagery. 

Perception is supposed to be a good source of knowledge because percep- 
tion tracks truth. But mental imagery is, by definition, a step removed from 
the truth it is supposed to track. Of course it can track truth, albeit in a fallible 
manner. The mental imagery used for filling in the blind spot, for example, is 
really very reliable. It can be fooled, but in the vast majority of cases it isn’t. So 
the mental imagery that is used to fill in the blind spot does track truth—not 
100 percent reliably, but nonetheless reliably enough. And the reason we 
know this is that we know the exact mechanisms of how the visual system 
uses the sensory stimulation around the blind spot as an input when filling in 
the blind spot. If this mechanism were less reliable, this mental imagery 
would fail to track the truth. 

But then the same question needs to be asked about those forms of mental 
imagery that play a more important role in everyday perception: about 
whether the mechanisms that construct these forms of mental imagery are 
reliable enough. Whether perception can justify beliefs depends on empirical 
facts about the reliability of the mechanisms of mental imagery involved in 
perception. 

Again, we can make the default assumption that our perceptual system 
built pretty good mechanisms for constructing mental imagery on the basis of 


* Ihave been working with a broadly externalist conception of perceptual justification: justification 
requires some degree of reliability. But externalism is not the only epistemic game in town. Internalists 
think that whether or not my belief is justified depends only on mental states I have conscious access 
to. So, strictly speaking, internalists could just deny that the reliability of perception has anything to 
do with perceptual justification at all. I do think that internalists also have a lot to worry about the 
involvement of mental imagery in perception, especially in the light of some recent findings that show 
that we experience amodally completed features as more reliable than features that are not amodally 
completed (Ehinger et al. 2017), but I will not pursue this argument here. 
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contextual or crossmodal information that does co-vary with the scene in 
front of us. The blind spot is a good example. We can fool the filling in of the 
blind spot, but this happens very rarely and only in exceptional circumstances 
(and only in monocular vision, for a start). 

But if perception is really a mixture between sensory stimulation-driven 
perception and mental imagery, then we cannot take it for granted that per- 
ceptual justification is unproblematic. We need to examine the mechanisms 
of mental imagery to see how reliable they are and what role they can play in 
perceptual justification. 

A lot more work needs to be done in order to show that we are justified in 
moving from (imagery-infused) perception to belief. Again, this is not to say 
that we cant eventually do so, we surely can. But any such move would need 
to involve a close empirical examination of the reliability of the processes that 
constitute mental imagery. 

It is also important to stress that when I say that almost all of our percep- 
tual states are in fact mixed sensory stimulation-driven/mental imagery 
states, I do not mean to suggest that the contribution of sensory stimulation- 
driven processes and not sensory stimulation-driven processes (that is, men- 
tal imagery) is approximately equal. In fact, it happens very rarely that they 
are equal. But the very fact that the mental imagery component is always 
lurking in the background should prevent us from taking perception at face 
value when it comes to perceptual justification. 

The conclusion is that the question of perceptual justification is, at least in 
part, an empirical question—it requires the examination of the reliability of 
the forms of mental imagery that play a role in perception per se. This is a 
sense (a fairly narrow sense, to be sure) in which epistemology needs to be 
naturalized. 


PART V 
ACTION 
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Desire 


We have seen how important mental imagery is for understanding perception 
and even a number of cognitive phenomena. In Part V of the book, I argue 
that, maybe even more surprisingly, it is also crucial for understanding many 
aspects of our motivation and execution of actions, starting with the concept 
of desire. 

I will argue that the goal state of desires is represented by mental imagery. 
In order to have a desire, one needs to represent the goal state of this desire. In 
order to have a desire to achieve this goal, we need to represent this goal. The 
representation of the goal state does not fix the content of the desire. We could 
have two desires that represent the very same goal state, but nonetheless have 
different contents (maybe because they represent different ways of achieving 
the goal state or a different degree of urgency). But all desires need to repre- 
sent the goal state. And I will argue that representing this goal amounts to 
having mental imagery of this goal. 

Some preliminary remarks about desires: I will not attempt to give an ordi- 
nary language analysis of an ordinary language term, namely, desire. It may 
be that there are mental phenomena that we refer to with the label “desire,” 
but that is not captured by the analysis I give here. I will restrict my argument 
to occurrent desires, by which I don’t mean necessarily conscious desires, but 
rather desires that play an active role in our mental life at the given moment.’ 

I will also restrict my argument to desires to perform an action. This rules 
out desires for past states of affairs, for example. Again, I don’t want to deny 
that we can call desires for past states of affairs genuine desires—I'm not doing 
ordinary language analysis. But I restrict the scope of my argument to desires 
to do something. And I will assume that a desire to do something can, at least 
in principle, lead to the action it is about. We can call desires of this kind exe- 
cutable desires. 

So why is it that we represent the goal state of our desires by means of men- 
tal imagery and not in some other way (say, by having a belief about it)? I will 


1 I set aside the debate about whether standing desires are genuine desires or merely dispositions to 
form occurrent desires. 
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give an empirically inspired argument for this claim and then show that 
mental imagery fulfills the two most important functional desiderata on the 
goal state of desires (evidence insensitivity and the relation to perceptual 
attention). 

The argument comes from empirical studies concerning the link between 
desires and mental imagery. Strong occurrent desire is invariably accompa- 
nied by vivid mental imagery (Kavanagh et al. 2009). Further, stronger desires 
(for example, to smoke) are accompanied by more vivid mental imagery (of 
smoking-related scenes) (Tiffany and Drobes 1990; see also Tiffany and 
Hakenewerth 1991). Similarly, desire for consuming alcohol can be induced 
by the mental imagery of entering one’s favorite bar, ordering, holding, and 
tasting a cold, refreshing glass of one’s favorite beer. In fact, this guided men- 
tal imagery triggers a stronger desire than actually seeing a glass of beer (Litt 
and Cooney 1999). More generally, the vividness of mental imagery is cor- 
related with the strength of one’s desire for a range of desirable substances and 
activities (Harvey et al. 2005; May et al. 2008; Kavanagh et al. 2009; Statham et 
al. 2011; see also the literature on desire thinking (Caselli and Spada 2011; 
Spada et al. 2013; Sadeghi et al. 2017); and see also Kemps and Tiggemann 
2015 and Papies and Barsalou 2015 for summaries). 

Mental imagery of neutral scenes, for example a rose garden, reduces desire 
for a cigarette in people who are trying to give up smoking (May et al. 2010). 
Mental imagery of unrelated odors has the same effect (Versland and 
Rosenberg 2007). Desire for eating chocolate can also be reduced by the men- 
tal imagery of neutral scenes (Harvey et al. 2005; Kemps and Tiggermann 
2007) and also by engaging involuntary mental imagery (by, for example, 
making a little figurine from clay or playdough without seeing your hands, 
which is an instance of involuntary multimodal mental imagery, where one’s 
visual imagery is triggered by tactile input (Kemps et al. 2014)). 

A useful distinction in the psychological study of desire is the distinction 
between wanting and liking (Berridge and Zajonc 1991; Berridge and 
Robinson 1995). Crucially, we can have a strong desire for something we do 
not actually enjoy—this happens in the case of addiction and also, arguably, 
in many weakness of will cases. This distinction is incorporated into many 
contemporary philosophical accounts of desire (Holton 2009). Importantly, 
vivid mental imagery of the reward can activate the wanting system in the 
same way as perceptual triggers of the reward can, and this can occur inde- 
pendently of the liking system (Berridge and Robinson 2003). 

The interaction between desire and mental imagery is even more compli- 
cated. Repeated exercise of mental imagery of a certain food product reduces 
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desire for this food product, which is a case where negatively valenced mental 
imagery results in the weakening of the desire (Morewedge et al. 2010; but see 
also Missbach et al. 2014 for some further wrinkles). Even crossmodally trig- 
gered mental imagery reduces desire, for example looking at salty food pic- 
tures, which triggers olfactory and gustatory imagery, reduces the enjoyment 
of salty food (Larson et al. 2014; see also Spence et al. 2016 for a summary). 

These results show a very close link between desires and mental imagery. 
But one might wonder whether they also show that the goal state itself is rep- 
resented by means of mental imagery. And, strictly speaking, they don't. Some 
of these results show that mental imagery influences desires. Others show 
that mental imagery is a downstream consequence of desires. In short, if we 
manipulate mental imagery, the desire changes and if we manipulate desires, 
the mental imagery changes. 

A straightforward way of explaining this two-way influence is by taking the 
goal state of desires to be represented by mental imagery. A much less 
straightforward and much less elegant way of explaining this two-way influ- 
ence would be to posit two different causal relations, namely, from mental 
imagery to desire and then from desire to downstream mental imagery. So 
the goal state of desire would not be represented by mental imagery, but it 
would cause and be caused by mental imagery. While this explanation could 
not be ruled out, the explanation in terms of the mental imagery of the goal 
state would be preferable on grounds of simplicity. Simplicity considerations 
are complex and debatable (Baker 2003; Sober 2015), but it is important to 
notice that here the alternative explanation would need to posit not one but 
two mental processes (the imagery — desire causal link and the desire = 
imagery causal link) and the only reason to posit these is to salvage this alter- 
native explanation. The explanatory scheme I am proposing does not need to 
posit anything we don't already have reasons to posit. 

The goal state of desires has a certain functional profile, and I will argue 
that mental imagery is well-suited (maybe even uniquely well-suited) to fulfill 
this functional profile. I will examine two important functional features 
(or desiderata) of the goal state of desires: its evidence insensitivity and its 
relation to perceptual attention. 

The first functional desideratum is evidence sensitivity. Empirical findings 
show that our representation of the goal state of our desires is very often 
(albeit not always) systematically misrepresented—it does not represent the 
goal state faithfully, but rather a more positively valenced, often even ideal- 
ized version of the goal state (see Andrade et al. 2012 for a summary of the 
research on this). So when the smoker has the desire to go outside and smoke 
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a cigarette, the goal state of the desire is represented in a manner that is more 
positively valenced than we have reason to represent it (“the best cigarette I 
have ever smoked”). The smoker knows that it is raining, the alleyway is stinky 
and she has a mosquito bite on her elbow that is really itchy, but none of this 
shows up in the representation of the goal state. Further, this misrepresenta- 
tion of the goal state persists in the face of a lot of conflicting evidence (even of 
the smoker is reminded of the rain and the mosquitos). In short, the represen- 
tation of the goal state of desires is very often insensitive to evidence. This puts 
a constraint on what kind of mental state this goal state representation can be. 

This desideratum immediately rules out some familiar mental states from 
being the representation of the goal state of desires. Beliefs, for example, are 
supposed to be sensitive to contradicting evidence (Helton 2020). But mental 
imagery isn't. This is not to say that beliefs are always sensitive to contradict- 
ing evidence (Harman 1984), but they are supposed to be. There is no such 
constraint on mental imagery. Thus, given that the representation of the goal 
state often shows systematic insensitivity to contradicting evidence, it seems 
that the goal state of the desire is not represented by beliefs. But as mental 
imagery also often shows systematic insensitivity to contradicting evidence, it 
satisfies the first desideratum (note that this may show that the representation 
of the goal state of desire is not a belief, but it does not show that it must be 
mental imagery, as belief and mental imagery may not be the only two possi- 
ble representations). 

The second functional desideratum is about keeping our attention on the 
task of achieving the goal state. A desire will not lead to action, if we lose focus 
of what it is that we wanted and how we wanted to do it. And given that the 
world is full of desirable things, in order to see to it that your desire gets satis- 
fied, you need to maintain attention to your original desire to prevent it from 
being dislodged and being replaced by another, new desire. In short, if a desire 
is to stand a chance to lead to action, it needs to focus your attention on the 
goal state and the way to achieve this goal state (see, for example, Scanlon 1998). 

And here we have a lot of empirical results that the attention in question, 
that is, the attention that plays a role in maintaining desires, includes a form 
of visual attention, which uses the resources of visual working memory (see 
Kemps et al. 2010; see also Andrade et al. 2012 for a summary). As a result, 
performing tasks that use visual working memory weakens desires and strong 
desires make us perform badly on tasks that require visual working memory. 
Given that the only representation that competes for the same resources as 
visual working memory is mental imagery (see Chapter 20), it follows that the 
attention that maintains our desires is attention to the desired state of affairs 
that involves visual (and often also olfactory, gustatory, etc.) imagery. In order 
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to actually get up, go to the kitchen, get out the milk from the fridge and pour 
myself a glass of milk (or perform some even more attention-demanding 
goal-directed action), I need to maintain my desire to drink some milk in the 
face of all the other fun things I could be doing. And I do so by attending to 
the mental imagery of the goal state (the mental imagery of the milk carton or 
of the action of drinking the milk). The mental imagery that is involved here 
does not need to be particularly vivid or determinate. But the more determi- 
nate it is, the more likely it is to be able to engage your attention and see you 
through this daunting task. 

Finally, I need to address a potential worry. What is the scope of these 
claims? What kind of mental imagery is involved in a desire not to do some- 
thing? More generally, what kind of mental imagery is involved when we have 
a desire to do something that is difficult to visualize? Maybe the desire to 
prove the fundamental theorem of Galois’s Theory in algebra? How about to 
desire not to prove the fundamental theorem of Galois’s Theory in algebra? 

The answer is twofold. First, the goal state of these desires is likely to be less 
specifically (that is, less determinately) represented. The content of many 
desires seems to be quite indeterminate. As La Rochefoucauld said, “We 
should earnestly desire but few things if we clearly knew what we desired”? 
And this, again, is a good match with mental imagery that can also be inde- 
terminate. Second, and more decisively, it is important to remember that the 
claim that the goal state of desires is represented by means of mental imagery 
does not imply that the overall content of the desire is fixed by the mental 
imagery alone as the content of a desire is not exhausted by the representation 
of the goal state of this desire. Finally, if none of this is convincing enough for 
the reader, as long as we can make a principled distinction between opaque 
and non-opaque desires (I tried to do so in Nanay forthcoming h), we restrict 
the claims I defended in this chapter to the latter category, as executable 
desires (what this chapter is about) would count as non-opaque. 

To sum up, we have strong empirical reasons to believe that the goal state 
of desires is represented by mental imagery and mental imagery satisfies the 
two desiderata on the functional roles the representation of the goal state of 
desires plays. The view that the goal state of desires is represented by mental 
imagery has some fairly surprising consequences for the way we should think 
about desires. The most important of these is that desires don't have a desire- 
like direction of fit. 

The concept of “direction of fit” has played an important role in the philos- 
ophy of mind in the last half-century (Anscombe 1957; Platts 1979; Searle 


? La Rochefoucauld: Maximes, 1665, section 439. 
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1983; Smith 1987; Humberstone 1992). As we have seen in Chapter 17, some 
representations represent the world as being a certain way. They attribute 
properties to objects. If the represented objects have these properties, the rep- 
resentation is correct. If they do not have these properties, they are incorrect. 
Representations of this kind have a “mind-to-world” direction of fit. 

Some other representations, in contrast, have a “world-to-mind” direction 
of fit: they do not describe how the world is, but prescribe how the world is 
supposed to be. To take Anscombe’s famous example, a shopping list has a 
“world to list” direction of fit, but if a detective follows the shopper around, 
making notes about what he buys, these notes have a “list to world” direction 
of fit (Anscombe 1957). Desires and intentions have a “world-to-mind” direc- 
tion of fit, whereas beliefs and perceptual states have a “mind-to-world” 
direction of fit. 

While this distinction may seem to be a straightforward one, the concept of 
direction of fit has been used in a variety of different ways and it has been 
used to play a variety of different explanatory roles (Gregory 2012; Frost 
2014). And there are various distinctions one can and should make between, 
for example, normative and descriptive theories of direction of fit. 

I want to add another distinction between different ways of thinking about 
direction of fit. Direction of fit can be intrinsic or extrinsic.* What I take to be 
the central and original concept of direction of fit is about the way the repre- 
sentation represents its object. The general idea is that there are two ways of 
representing: descriptively and prescriptively. 

Those accounts of direction of fit that contrast correctness conditions 
(which is the mark of mind-to-world direction of fit) and satisfaction condi- 
tions (which is the mark of world-to-mind direction of fit) assume that direc- 
tion of fit is intrinsic (Platts 1979; Searle 1983; Velleman 1992; see also Lauria 
2017). These accounts construe the content of beliefs as correctness conditions 
(which is why beliefs have mind-to-world direction of fit) and the content of 
desires as satisfaction conditions (which is why desires have world-to-mind 
direction of fit). In other words, the difference in direction of fit between these 
two kinds of representation, according to this way of thinking about direction 
of fit, is a difference in content—the representation relation itself. In other 


ĉ Depending on one’s views about the content of mental representations, different concepts of 
direction of fit will be more appealing. If one takes the paradigmatic mental representations to be 
propositional attitudes and endorses some form of strict distinction between the attitude and the 
proposition, then focusing on extrinsic direction of fit would be a more natural choice (as the proposi- 
tional content remains the same if the attitude changes). If one takes the content of mental representa- 
tions to include some form of “mode of presentation,’ then focusing on intrinsic direction of fit would 
be a more natural choice. 
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words, beliefs and perceptual states represent descriptively: they represent 
how things are, while desires and intentions represent prescriptively: they 
represent in a way that pushes you to change things. There are two different 
ways of representing. 

I call this concept of direction of fit intrinsic direction of fit, as it is sup- 
posed to be a feature of the representation relation itself. It is the representa- 
tion relation itself that has intrinsic direction of fit. Intrinsic direction of fit 
could be contrasted with extrinsic direction of fit, which is not a matter of the 
representation relation itself, but rather of how the representation is used or 
what functional role it plays (Smith 1987; Humberstone 1992; Gregory 2012). 
If you have a false belief, you need to (or tend to) change your belief—you 
need to make it fit the world. If, in contrast, you have an unsatisfied desire, 
you need to (or tend to) change the world in such a way that your desire is 
satisfied. So you need to (or tend to) make the world fit your desire. 

Again, this distinction is not about the way in which the two representa- 
tions represent. It is about what we should do (or what we tend to do). So, we 
can maintain this distinction between extrinsic mind-to-world and world-to- 
mind directions of fit while denying that representations can represent in two 
different ways. Even if, as one might argue (see below), representations only 
represent descriptively, that is, even if they can only have mind-to-world 
intrinsic direction of fit, all the options about extrinsic directions of fit are still 
very much open. We can still make a distinction between extrinsic world-to- 
mind and mind-to-world direction of fit as what we should change in the case 
of misrepresentation is not something that has to be (or even can be) built 
into the representational relation itself. So, it is possible that the functional 
role of desires is such that if the desire is not satisfied, we should change the 
world. And the functional role of beliefs is such that if the belief is not true, 
we should change the beliefs. It would also be consistent with my claim about 
intrinsic direction of fit (that is, that there is only mind-to-world intrinsic 
direction of fit) to deny that the concept of (external) direction of fit is useful 
or coherent (see Frost 2014). 

If the argument I presented in this chapter is correct, then desires don't 
have world-to-mind (or prescriptive) intrinsic direction of fit. To put it 
provocatively, desires don't have desire-like direction of fit. 

I argued that the goal state of desires is represented by mental imagery. The 
question is how this goal state is represented: descriptively or prescriptively? 
The goal state is a state of affairs: me taking out the milk from the fridge, for 
example. There is nothing prescriptive about the representation of this state of 
affairs. More generally, mental imagery represents descriptively. Visualizing 
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an apple represents the (imaginary) apple as having (imaginary) properties. 
And having mental imagery of the goal state represents the goal state (me 
eating the apple) as having some imaginary properties (of tasting good, 
tasting bad, tasting neutral, and so on). 

More slowly: the question here is more specifically about the intrinsic 
direction of fit of mental imagery. In other words, the question is whether it 
represents descriptively or prescriptively. Mental imagery, as we have seen, is 
perceptual representation that is not directly triggered by the sensory input. It 
is a form of perceptual representation, and the difference between sensory 
stimulation-driven perception and mental imagery is that of etiology: one is 
triggered directly by the sensory input and the other isn’t. In other words, 
mental imagery represents the way perception represents. And as it is univer- 
sally agreed that perception has mind-to-world intrinsic direction of fit, it 
follows that mental imagery also has mind-to-world intrinsic direction of fit. 

So, the representation of the goal state of desires has descriptive, that is, 
mind-to-world, direction of fit. Not desire-like direction of fit, but belief- 
like or perception-like direction of fit as mental imagery has mind-to-world 
direction of fit. This, of course, leaves open the possibility that desires repre- 
sent both descriptively and prescriptively, or, as it is often put, that they 
have both directions of fit: they both describe how the world is and pre- 
scribe an action. 

Some representations are said to have both directions of fit: they represent 
the way the world is, and at the same time prescribe how the world is sup- 
posed to be (see Millikan 1995; Clark 1997; Pacherie 2000; Pacherie 2011 for 
the concept of double direction of fit). Ruth Millikan introduced the term 
“Pushmi-Pullyu representations” for representations that have both direc- 
tions of fit (Millikan 1995): they represent the way the world is and at the 
same time prescribe how the world is supposed to be. The sentence “Dinner!” 
for example, has such double direction of fit: it both describes something 
(that dinner is ready) and prescribes an action (that one should come to eat 
dinner). 

The proposal then would be that, besides mind-to-world (descriptive) 
direction of fit, desires also have world-to-mind (prescriptive) direction of fit. 
Or maybe while some components of the desire (the representation of the 
goal state) have mind-to-world direction of fit, the whole desire has world-to- 
mind direction of fit. Even if desires (or some of their components) have 
descriptive direction of fit because they represent their goal states descrip- 
tively, this does not exclude the possibility that they also represent 
prescriptively. 
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What would be the justification for positing double direction of fit for 
desires though? Given that we need to posit descriptive direction of fit in 
order to account for the representation of the goal state of desires, we would 
need pretty strong reasons to also attribute the opposite direction of fit. The 
standard reason why desires are described as having world-to-mind direction 
of fit is that they play a role in motivating action and if they only represented 
the world descriptively, this motivating role would be difficult to account for. 

This argument has been made against the most influential theory that 
attributed mind-to-world direction of fit to desires, the besire theory. 
According to the besire theory, desires are beliefs of a certain kind (thus the 
witty label). Besire theories come in many colors (Smith 1987, 1994; Lewis 
1988; Gregory 2017; see also Lauria 2017), but it is important to point out that 
the account of desires I am arguing for here is not a besire theory. Beliefs are 
propositional attitudes and given that according to the besire theory, beliefs 
and besires are the only building blocks of the mind, it is a fair question to ask 
how beliefs and further beliefs (as desires are also beliefs) manage to motivate 
us to act. 

My account, according to which the goal state of desires is represented by 
mental imagery, fares better in this respect. As we have seen, mental imagery 
can be (although it does not need to be) emotionally charged or valenced (see 
Chapter 23). If you visualize your long-deceased grandmother, this imagery 
can be emotionally charged, for example. And valenced imagery can motivate 
us to act. This has been an old theme in the history of philosophy, going back 
at least to Aristotle (see McMahon 1973). 

Empirical research also shows that valenced imagery can motivate us to 
act. Valenced mental imagery of performing an action increases the proba- 
bility of the task completion (at least in the case of tasks we tend to put off; 
see Renner et al. 2019). Further, as we have seen in Chapter 22, if you are 
undecided between two positive options, the one that is imagined more 
vividly tends to win out. And if you are undecided between two negative 
scenarios, the one that is imagined less vividly tends to win out (Austin and 
Vancouver 1996; Trope and Liberman 2003). But if valenced imagery can 
motivate us to act, and if desires represent their goals by means of imagery, 
then we have no prima facie reason to posit world-to-mind direction of fit 
to desires. 

The proposal that desires do not have desire-like, that is, world-to-mind 
direction of fit may not be that radical. Some philosophical theories of desire 
would be consistent with the view I argued for. An important example is 
Tim Schroeder’s empirically informed theory of desire that takes desires to be 
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the mental states that drive reward-based learning (Schroeder 2004). There is 
no prima facie reason to think that this mental state that drives reward-based 
learning would need to have world-to-mind direction of fit. The reward-based 
theory of desire says nothing about mental imagery, but it could be supple- 
mented by an account of the role of mental imagery in desires (which would 
explain at least some features of the reward-based learning process (see esp. 
Berridge and Robinson 2003)). 

A further consideration comes from phenomenology. It has been pointed 
out that there is a mismatch between the phenomenology of desires and their 
alleged world-to-mind direction of fit (Hulse et al. 2004). What shows up in 
our phenomenology when we have an occurrent desire are only states with 
mind-to-world direction of fit, say, perception or imagery of the desired 
object or introspection of my (bodily) states. In other words, on the face of it, 
and bracketing the usual worries about the unreliability of arguments from 
phenomenology, positing only mind-to-world direction of fit for desires 
seems to match our naive conception of desires. 

Finally, the proposal that desires do not have desire-like direction of fit is 
very much consistent with empirical research on desires. One of the most 
influential theories of desire in psychology is the “elaborated intrusion” the- 
ory (Kavanagh et al. 2005; May et al. 2014). According to this view, forming a 
desire is a two-step process. First, a mental state intrudes our mind, which 
represents the desirable state of affairs. This often happens unconsciously, and 
it is often not clear what triggers this intruding mental state. The second step 
is that this representation is elaborated with the help of mental imagery. 

The first thing to note is that if we parse the elaborated intrusion theory as 
a philosophical view, it takes mental imagery to be constitutive of desires: the 
intruding representation is elaborated with the help of mental imagery.* Even 
more importantly, it is also a theory that posits only mind-to-world direction 
of fit for desires. The original intruding representation has a mind-to-world 
direction of fit and the mental imagery that is used to elaborate this intruding 
representation also has mind-to-world direction of fit. 


* This view about the centrality of mental imagery in desires may also be consistent with those 
philosophical accounts of desire that construe desires as the appearances of the good (Oddie 2005; 
Tenenbaum 2007). 
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Pragmatic Mental Imagery 


Mental imagery plays an important role not only in desires, but also in action 
execution. A more obvious claim would be that motor imagery plays such a 
role, and I will argue in Chapter 27 that it indeed does, but so does bona fide 
mental imagery. Before I argue for these two claims, it is important to make a 
distinction between mental imagery and motor imagery. In fact, we can use 
the general theoretical framework of mental imagery to understand motor 
imagery better.’ So before I turn to the discussion of the role of sensory men- 
tal imagery in action performance in the second half of this chapter, I will talk 
about the relation between motor imagery and sensory mental imagery, and 
continue to discuss motor imagery in Chapter 27. 

Motor imagery has been traditionally understood as the feeling of imagining 
doing something. It is sometimes taken to be necessarily conscious, and not 
only by philosophers (Currie and Ravenscroft 1997) but also occasionally 
even by psychologists (Jeannerod 1994, 1997; see also Brozzo 2017, esp. pp. 
243-4 for an overview). And as imagining tends to be a voluntary act, motor 
imagery is also often taken to be voluntary. So the paradigmatic example here 
is closing your eyes and imagining reaching for an apple. 

But just as in the case of mental imagery, examples of this kind are not 
representative of motor imagery. Motor imagery, just like mental imagery, can 
be conscious or unconscious (see, for example, Osuagwu and Vuckovic 2014) 
and it can also be voluntary or involuntary. In order to understand how we 
can generalize to involuntary and unconscious cases, we should follow a 
methodological advice from one of the most important researchers working 
on the cognitive neuroscience of action, Marc Jeannerod.” 


1 I have contrasted and will continue to contrast motor imagery (a motor phenomenon) and men- 
tal imagery (a perceptual phenomenon). But as motor imagery is undoubtedly a mental representa- 
tion, this terminology is potentially misleading. To mitigate this, I will say often “sensory mental 
imagery” in this chapter, to indicate that what I have in mind when referring to mental imagery here is 
perceptual representation that is not directly triggered by the sensory input. 

> Jeannerod sometimes also took it for granted that motor imagery is necessarily conscious, he 
even defines motor imagery once as “the ability to generate a conscious image of the acting self” 
(Jeannerod 2006, p. 23). But when actually using this concept, he drops the assumption that motor 
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Jeannerod writes: “Motor imagery would be related to motor physiology in 
the same way visual imagery is related to visual physiology” (Jeannerod 1994, 
p. 189). And rightly so: if visual imagery is “early” cortical activation that is 
not directly triggered by sensory input, then motor imagery is “late” cortical 
activation that does not directly trigger bodily movement (see Nanay 2020a). 

More slowly: In the case of visual perception, light hits the retina and this 
retinal stimulation then triggers processing in the primary visual cortex (V1) 
and then in other early cortical visual areas like V2, V4/V8, or MT. When 
processing in these early cortical areas is not triggered directly by retinal 
stimulation, we have mental imagery. 

We get the converse picture with motor imagery. To simplify a bit, when we 
perform an action, before our body moves, there is processing in the primary 
motor cortex (M1). And before that, we get processing in the premotor cortex 
and in the supplementary motor area (SMA) (and before that, in the posterior 
parietal cortex (PPC)). So processing in PPC, SMA, the premotor cortex, and 
M1 triggers bodily movement. When processing in the motor cortex does not 
directly trigger bodily movement, we get motor imagery. Motor imagery is 
cortical motor processing that does not directly trigger motor output. 

The paradigmatic example of imagining grasping the apple will come out 
as motor imagery on this definition as we have a large and growing literature 
on the involvement of the motor cortex in conscious and voluntary motor 
imagery (like deliberately imagining doing something). Processing in the pre- 
motor cortex and the supplementary motor area during conscious and volun- 
tary motor imagery has been known for a long time (Roland et al. 1980; Fox 
et al. 1987; Decety et al. 1990, 1994; Stephan et al. 1995; Filimon et al. 2007). 
The same goes for the posterior parietal cortex (Aflalo et al. 2015). 

There have been some controversies about the involvement of the primary 
motor cortex in voluntary motor imagery (Roland et al. 1980; Decety et al. 
1994; Stephan et al. 1995). But more recently there is converging evidence 
that the primary motor cortex is active during conscious and voluntary motor 
imagery (Gandevia and Rothwell 1987; Georgopoulos et al. 1989; Porro et al. 
1996; Roth et al. 1996; Schnitzler et al. 1997; Richter et al. 2000; Miller et al. 
2010; Saruco et al. 2017; see also Dechent et al. 2004 for an error theory of 
why earlier studies failed to find the involvement of M1 in motor imagery). 


imagery is conscious. See esp. Frak et al. 2001; Jeannerod 2001; and the discussion between Jeannerod 
and Rizzolatti following Rizzolatti 1994 about unconscious imagery (also confirmed in personal com- 
munication, Nottingham, March 2001). 
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It is important that this is a functional, not a physiological, way of defining 
motor imagery (just as the definition of mental imagery was also functional 
and not physiological). In the case of mental imagery, processing in V1 that is 
not triggered directly by visual input was not necessary for mental imagery. If 
the V1 is silent, but there is processing in V2 or V4 that is not triggered directly 
by visual input, we still get mental imagery. What is important is that mental 
imagery is perceptual processing not directly triggered by sensory input. 

Similarly, Pm not claiming that activity in M1 that does not directly trigger 
bodily movements, is necessary for motor imagery. Even if M1 is silent but 
the premotor cortex or the SMA is not, and there is no overt movement, we 
can still talk about motor imagery (see, for example, Gentili et al. 2004; 
Hanakawa et al. 2008; Gandrey et al. 2013). It is important that we do not 
need to resort to neuroimaging in order to find out whether the subject exer- 
cises motor imagery—we can also use behavioral methodology. One such 
behavioral method involves eye tracking, as motor imagery evokes very spe- 
cific eye movement patterns that are very different from visual mental imagery 
(and that is present both in conscious and in unconscious motor imagery; see 
Poiroux et al. 2015 for a summary of the research on this). 

Here is a brief way of summing up the structural relation between mental 
imagery and motor imagery. Whereas mental imagery is the first stop of per- 
ceptual processing that is not directly caused by any input, motor imagery is 
the last stop of motor processing that does not directly cause any output. 

This way of thinking about motor imagery can also help us with a notori- 
ous unclarity about the traditional, phenomenological way of zeroing in on 
motor imagery as the feeling of imagining doing something. As acknowl- 
edged by all involved in this debate, not all imaginative episodes of doing 
something would count as motor imagery: you somehow need to imagine 
doing something from a first-person, and not a third-person perspective. 
Jeannerod himself made a distinction (following the practice in sport psy- 
chology) between internal (first person) and external (third person) imagery, 
and only the former would count as motor imagery (the latter would be sen- 
sory imagery of me doing something; see Jeannerod 1994, p. 189). As this 
phenomenological distinction between first-person and third-person imagery 
is vague (something acknowledged by Jeannerod 1997), using the functional 
criterion is preferable. 

I have emphasized the symmetry between the way sensory mental imagery 
relates to input and motor imagery relates to output. But the relation between 
sensory and motor imagery is more complex. To put it very simply, motor 
imagery necessarily involves sensory mental imagery. This is hardly surprising 
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when we think of conscious examples of motor imagery: imagining touching 
the camera of my laptop involves some form of sensory mental imagery 
(maybe visual imagery of my finger touching the camera, or, maybe, more 
minimally, proprioceptive mental imagery of my finger being at a different 
location from where it is now). 

But there are also empirical reasons to think that motor imagery necessarily 
involves sensory mental imagery. In a recent experiment (Kilteni et al. 2018), 
subjects had to imagine touching their right index finger with their left index 
finger. We know that self-touch is very different from being touched by some- 
one else (as the famous example of the impossibility of tickling oneself shows). 
But imagining touching one’s finger has a very similar sensory profile to 
actual self-touch (and very different from touching or being touched by 
someone else or an inanimate object). The researchers conclude that motor 
imagery necessarily entails representing the sensory consequences of the 
imagined action. In terms of this book this amounts to having (temporally 
forward-looking) sensory mental imagery (see also Bennet and Reiner 2022 
for further support for this claim). 

This intertwining of motor and sensory imagery makes it even more prob- 
lematic to rely on introspective ways of identifying motor imagery (and 
keeping it apart from motor imagery-free sensory imagery of our own body). 
But even more importantly, the necessary involvement of sensory mental 
imagery in motor imagery makes mental imagery even more important for 
understanding various components of action planning and action execution, 
as we shall see in Chapter 27. This book is about mental imagery, not motor 
imagery (a topic that would deserve a book-length study in itself), so I will 
only briefly discuss, in Chapter 27, the role of motor imagery in action exe- 
cution. But there are lots of exciting questions about motor imagery that I 
want to leave open here, especially the extent to which various aspects of 
motor imagery (its content, for example) could be explained in terms of the 
sensory imagery it necessarily involves. 

I will come back to the concept of motor imagery in Chapter 27. But this 
chapter is about a form of mental imagery that nonetheless is crucial for 
action execution: pragmatic mental imagery. 

Mental imagery, unlike motor imagery, is perceptual representation. It can 
nonetheless play an important role in action execution (see also Van Leeuwen 
2011 for a related argument). Some of our actions (in fact, most of our 
actions) are perceptually guided actions: our perceptual states trigger and 
guide our actions. When we pick up a coffee cup to drink from it, this is a 
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perceptually guided action: our perceptual state represents the spatial loca- 
tion of the cup, which then guides your reaching movement. 

In Nanay 2013a, I called perceptual states of this kind, that is, perceptual 
states that guide our motor actions, “pragmatic representations.” Pragmatic 
representations represent those parameters of the situations that are neces- 
sary for the successful performance of actions. Just what these parameters are 
could be debated: they may include the properties of the objects one acts 
upon, the properties of one’s own body, one’s bodily movement that is needed 
to complete the action, or maybe the properties of the goal state the action is 
aimed at (see Poincaré 1905/1958; Bach 1978; Brand 1984; Jeannerod 1997; 
Millikan 2004; Pacherie 2011; Nanay 2013a; Butterfill and Sinigaglia 2014 for 
very different proposals about this). 

I argued that whatever else needs to be represented about the situation that 
is necessary for the execution of the action, some egocentrically represented 
properties of the objects definitely do need to be represented. Pragmatic rep- 
resentations represent exactly these: shape, size, and spatial location of the 
distal objects the action is directed at. These properties are egocentric in the 
sense that they represent the shape, size, and spatial location of the distal 
objects as related to our own body: the size as related to our grip size, the 
spatial location as related to our own spatial location, and so on. In other 
words, these egocentric properties are relational properties: they are relations 
between the properties of the object and our own properties (say, the relation 
between the size of the cup and my grip size). 

Simple representations of this kind are involved in the performance of all 
bodily actions. These properties need to be represented in order for the agent 
to be able to perform the action at all. Suppose that the action is to pick up a 
cup. If I didn't represent the size of the cup, I would have no idea what grip 
size I should approach it with. If I didn’t represent its spatial location, I would 
have no idea which direction I should reach out towards. And so on. 

Pragmatic representations are genuine representations: they can misrepre- 
sent. If I represent the shape-property of the cup correctly, then I will be more 
likely to approach it with the appropriate grip size, which makes it more likely 
that my action will be successful. And if I represent the spatial location of the 
cup correctly, I will be more likely to reach out in the right direction, which, 
again, makes it more likely that my action succeeds. I also argued that prag- 
matic representations are genuine perceptual representations, but I will not 
need this stronger claim for the purpose of the argument in this book. But 
pragmatic representations represent relational features of perceived objects. 
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Pragmatic representations do not need to be (and arguably they normally 
are not) conscious. But then how do we know what property (say, shape- 
property or spatial location property) they attribute to the cup? Clearly not by 
introspecting. We can infer what shape-property this pragmatic representa- 
tion attributes to the cup from the grip size I approach the cup with. And we 
can infer what spatial location property it attributes to the cup from the 
direction of my reaching. In other words, if the shape-property the pragmatic 
representation attributes to the cup changes, this affects my behavior, that is, 
the grip size of my approaching hand, directly. And if the spatial location 
property the pragmatic representation attributes changes, this also affects my 
behavior—the direction I reach out towards (see Jeannerod 1997 for a num- 
ber of case studies of how intervention on the pragmatic representation 
leads to observable changes in our behavior). In one famous experiment, in 
the middle of the performance of the reaching movement, the target was 
changed—either its spatial location or its size. And this influenced the action 
execution—the reaching movement changed direction during the execution 
of this action. The subjects were almost always unaware that anything has 
changed (Goodale et al. 1986; Pelisson et al. 1986; Paulignan et al. 1991). 

So far, I have talked about cases where there is a cup in front of me and 
I am looking at it as I am performing the action. These are perceptually 
guided actions. A (sensory stimulation-driven) perceptual state guides my 
action. When I pick up the cup, while looking at it, the visual feedback helps 
me to do so. I can adjust my movements in the light of my visual experience 
of how my action succeeds: if my initial reach was too forceful, I can adjust 
its course in response to the visual feedback (as we have seen in the previous 
paragraph, some of this happens unconsciously; see Paulignan et al. 1991). 

But I can also perform this action, and do so fairly successfully, without 
looking. I am looking at the cup, I then close my eyes, count to ten and then 
reach out to grab it. In this case, it is my mental imagery that guides my 
action. It is a special kind of mental imagery inasmuch as it attributes very 
similar properties as pragmatic representations do: egocentric spatial location 
properties (that allow me to reach out in the appropriate direction), egocen- 
tric size properties (that allow me to approach the cup with the appropriate 
grip size), and so on. And it is also, like pragmatic representation, a genuine 
representation, as it can misrepresent (Nanay 2013a, 2022a). I call this kind of 
mental imagery “pragmatic mental imagery.” 

Manipulating pragmatic mental imagery leads to observable behavioral 
changes in the same way as manipulating pragmatic representations leads to 
observable behavioral changes: if your pragmatic mental imagery attributed a 
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different size-property to the cup, you would approach it with a different 
grip size. 

In this example, the pragmatic mental imagery was formed on the basis of 
your perceptual state: you looked at the cup and then you closed your eyes, 
but it is this visual information that the mental imagery is based on. But prag- 
matic mental imagery is more than just some kind of echo of sensory input. 
Suppose that you are in your bedroom and it is pitch-dark. You want to switch 
on the light, but you can’t see the switch. You are nonetheless in a position to 
switch it on given your memory of the room’s layout and the location of the 
light switch in it. In this case, your pragmatic mental imagery is formed on 
the basis of your memory. 

But pragmatic mental imagery can be triggered by completely non- 
perceptual means as well, for example if I blindfold you and then explain to 
you in great detail where exactly the coffee cup is in front of you, how far 
exactly to the left and how far exactly ahead, and so on. Your pragmatic men- 
tal imagery can still guide your action, but it does so without any (visual) 
input. In our everyday life many of our actions, especially our routine actions, 
like flossing, are in fact guided by pragmatic mental imagery. 

Pragmatic mental imagery has been actively used in medical training. 
Surgeons who perform rare operations are often trained with the help of 
visual mental imagery for these operations (as they can’t prepare by conduct- 
ing surgery of this kind). And such visual imagery training helps surgeons to 
be more precise with their procedure (Sanders et al. 2004, 2008; Immenroth 
et al. 2007). 

Pragmatic mental imagery also plays an important role in some of our 
pretense actions. Take the following pretend action. I pretend to raise a 
glass and take a sip from it, even though my hands are empty. How is this 
pretend action different from the actual action of taking a sip from an actual 
wine glass? Obviously, there is no glass in one case and there is glass in the 
other case. But how are our mental processes different? What representa- 
tional state allows me to hold my hand and move it towards my mouth in 
the way that I do? 

According to the two most influential accounts of pretense, the representa- 
tional states that bring about pretense actions are either an actual (condi- 
tional) belief and an actual desire (Nichols and Stich 2003) or a “belief-like 
imagination” and a “desire-like imagination” (Currie and Ravenscroft 2002; 
Velleman 2000; Doggett and Egan 2007). While these accounts may explain 
some pretense actions, they are less well-suited for examples like taking a sip 
from an imaginary glass. The belief (or the belief-like imagination) that I am 
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making a toast does not specify what grip size I should maintain while I am 
performing the action. Nor does the desire (or desire-like imagination). 

And we cannot rely on pragmatic representations either (whose job it 
would normally be to help us to have the appropriate grip size): they attribute 
egocentric shape, size and spatial location properties to perceived objects, but 
there is no perceived glass—I am raising my empty hand. My proposal is that 
the representational state that allows me to hold my hand and move it to my 
mouth is pragmatic mental imagery. 

When I am pretending to raise my glass with nothing in my hands, I pre- 
sumably have belief-like imagination that I am drinking a glass of wine (and 
maybe corresponding desire-like imagination), but in order for this belief- 
like imagination to have any influence on my actual movements, I also need 
to have pragmatic mental imagery that allows me to hold my fingers and 
move my hand in a certain way. This pragmatic mental imagery attributes 
properties like egocentric weight, shape, and spatial location properties to the 
imagined glass in my hand. We have pragmatic mental imagery of the 
weight-property, the shape-property, the size-property, etc. of the nonexistent 
glass, and the attribution of these properties guides my pretend action: it 
guides the way I hold my finger (as if around a glass), the way I raise my hand 
(as if raising a glass), etc. This pretend action cannot be explained without 
appealing to pragmatic mental imagery. 

It is important to emphasize that I do not need to consciously visualize the 
glass in order to attribute various properties to it by means of mental imagery. 
Nor does this mental imagery need to be triggered voluntarily. But I do need 
to attribute egocentric shape, size, etc. properties to the glass by means of 
mental imagery—otherwise I would not know how to move my hand. 

In the example of pretending to take a sip from a nonexistent glass, my 
belief, my desire, as well as my pragmatic representation, are all “imaginary”: 
belief-like imagination, desire-like imagination, and pragmatic mental 
imagery. But in other cases of pretend actions, our pragmatic representation 
is not imaginary: we do not need pragmatic mental imagery. Here is an 
example: I am taking a sip from a glass of cheap and bad red wine, and 
I pretend that I am taking a sip from a glass of 2004 Brunello di Montalcino. 
I may be using belief-like imagination (and, presumably, desire-like imagi- 
nation), but the pragmatic representation that guides this action is exactly the 
same as it would be if I were not pretending. Pretense can happen without 
pragmatic mental imagery. 

This leads to a non-monolithic account of pretense actions. Some of our 
pretense actions can be explained with the help of belief-like (and desire-like) 
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imagination. Some can be explained with the help of pragmatic mental imagery. 
And in some cases, we need to appeal to both kinds of imaginary states. 

Neil Van Leeuwen highlighted an important special case of pretense that he 
calls “semi-pretense”—a mental state somewhere in between pretending and 
performing a real action (Van Leeuwen 2011). His example is a scenario 
where two kids, who are watching some other kids jumping off the highdive, 
are evaluating the quality of the dives by holding up their fingers. Is this a real 
action? In some sense, it is: they are genuinely evaluating the dive of their 
friends. But it is also pretense inasmuch as they pretend to be judges who give 
points for each dive. It is not full pretense though—they do not hold up plac- 
ards with numbers, they merely use their fingers. 

Van Leeuwen argues that the two mainstream accounts cannot explain 
semi-pretense because there is no middle ground between beliefs and belief- 
like imagination (or between beliefs and conditional beliefs). 

My account can explain “semi-pretense” because it allows for the “inte- 
gration of perception and imagination, —this should be clear enough in the 
context of this book. Sometimes, for example, when I am pretending to stab 
you with a sword with my hands empty, I attribute all the relevant proper- 
ties to the nonexistent sword by means of pragmatic mental imagery. Some 
other times—for example, when I am pretending to stab you with a sword 
and I in fact hold an umbrella in my hand—I attribute some of these prop- 
erties perceptually (its weight, for example), but I attribute others by means 
of mental imagery (for example, the property of where the end of the sword 
is and how sharp it is). In this case, some of the properties are attributed by 
my pragmatic representation, and some others by my pragmatic mental 
imagery. 


27 


Motor Imagery and Action 


What triggers the execution of actions? Suppose that there is a cup of tea next 
to your computer while youre working. You want to take a sip, you have a 
belief that the tea is not too hot and it would quench your thirst, you have a 
(distal) intention to take a sip. But you're not doing it. And suddenly, you find 
yourself taking a sip. What happens in that moment when this action is trig- 
gered? What mental state is there at the moment of action execution that was 
not there a second before? I take these to be among the most important ques- 
tions of philosophy of action (see Brand 1979; Nanay 2014b). 

The question about what triggers actions also has serious implications for 
our everyday life and well-being. In the case of taking a sip of tea, I wanted to 
do so and I formed an intention to do so. The question was just how this 
desire and intention gave rise to the actual bodily movement. But there are 
other cases where the executed action goes against our desires and even our 
intentions. Akratic actions are obvious examples: next to your computer is the 
TV’s remote control, not a cup of tea. And you want to finish the grant pro- 
posal and have an all-things-considered intention to do so, but you nonethe- 
less find yourself switching on the TV. How is that action triggered? 

Addictions of various kinds raise the same problem (Brevers et al. 2012). 
Recovering addicts have a very strong desire not to relapse. But when they do 
relapse (when their “relapse actions,’ as I will call them, are triggered), what 
triggers these actions? 

The concept of motor imagery can help us to address these questions. As 
we have seen in Chapter 26, motor imagery is cortical motor processing that 
does not directly trigger motor output. While there has been a fair amount of 
research in psychology and neuroscience on motor imagery in the last thirty 
years or so, it is only recently that we start to understand the important role 
motor imagery plays in action initiation. And if, as these findings suggest, 
motor imagery plays an important role in action initiation, we can make 
progress not only in understanding action initiation in general but also in 
understanding what goes wrong in akratic actions and in relapse actions. 

The question of action initiation is widely studied in neuroscience and 
psychology. Neuroscientists of action make a distinction between the 
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preparation for a movement and the execution of that movement. The set of 
findings I want to focus on here is about one major difference between these 
two phases of action execution: the inhibition of action during the prepara- 
tion for a movement and the lifting of this inhibition shortly before the execu- 
tion begins (see Porter and Lemon 1993 for an overview). This difference is at 
the segmental spinal level, that is, not in the brain, but in the spine. There is a 
sharp decrease of spinal reflexes during preparation for a movement (which 
prevents motor neurons from spontaneous firing) and an increase again 
shortly before execution (Requin et al. 1977; Bonnet and Requin 1982; 
Fourkas et al. 2006; see also Kyriakatos et al. 2011). 

Motor imagery, like action execution, but unlike action preparation, 
increases spinal excitability (Bakker et al. 1996; Bonnet et al. 1997; Li et al. 
2004; Guillot et al. 2007; Aoyama and Kaneko 2011). Further, motor imagery 
training increases spinal plasticity (Grospretre et al. 2019). So whatever 
increases spinal excitability is there both in motor imagery and in action exe- 
cution. This means that the increase in spinal excitability is not sufficient for 
triggering the action: in the case of motor imagery, we have an increase in 
spinal excitability, but no action performance. 

Given that both motor imagery and action initiation increase spinal 
excitability—and therefore the “readiness” to perform an action, one should ask 
how motor imagery might contribute to the triggering of the bodily movement. 

The relation between motor imagery and actual action performance has 
been investigated for a long time (see especially Marc Jeannerod’s work: 
Jeannerod 1994, 1997, 2006; see also McCormick et al. 2013). It has been 
known for decades that there is a substantial overlap between the brain 
regions involved in motor imagery and in action execution (see Miller et al. 
2010 for a summary). But the main emphasis of the research on the connec- 
tion between motor imagery and action performance has been on how motor 
imagery can help us to make our action performance more accurate (see the 
vast amount of research in sport psychology on this (Feltz and Landers 1983 
is a classic summary)). What I want to focus on is a much more recent body 
of findings, which is not about how motor imagery can modify the ways in 
which actions will be performed, but about how it can help trigger action exe- 
cution (see Nanay 2020a for a longer version of this argument). 

And there are some important recent results that suggest that motor 
imagery can make it more likely that the bodily movement is triggered (most 
of the findings at the moment seem to be limited to some simple bodily 
movements only in healthy subjects; see Rodrigues et al. 2010; Stins et al. 
2015; but see also Schwoebel et al. 2002 and Fourkas et al. 2006). Further, 
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incongruent motor imagery interferes with action execution (Ramsey et al. 
2010). These findings suggest that the initiation of actions is made more 
probable by having motor imagery of the performance of this action and it is 
made less probable by having motor imagery of some other actions (see also 
Nanay 2017c). 

Nothing in these empirical results suggests that motor imagery reliably 
leads to action execution. All that follows is that it makes the triggering of 
action execution more likely by pushing the spinal excitability further and 
further up. But the mere fact that motor imagery is a factor in what triggers 
actions is something that could have a significant impact on understanding 
the mechanism of action initiation. Thinking about the role of motor imagery 
in action initiation helps us to understand how akratic actions are triggered. 

You are working on your computer and suddenly the idea of watching TV 
instead pops into your head. And then you find yourself reaching for the 
remote. My claim is that one of the mental states that has contributed to the 
triggering of the action of reaching for the remote is motor imagery. As a 
result, one thing we can do if we want to resist the temptation of watching TV 
would be to manipulate our motor imagery (see Papies and Barsalou 2015; 
Cornil and Chandon 2016). 

The link between motor imagery and akratic actions is even more straight- 
forward in cases we might call “obsessive procrastination.” You know that you 
need to work on a grant proposal that is due tomorrow, but you are instead 
playing a video game. You know you need to stop, but you keep on playing. If 
we understand the role of motor imagery in action initiation, this is not sur- 
prising at all. When playing a video game, you already have your motor 
imagery engaged in the video game and this leads to the initiation of the 
action of playing another level, rather than getting up and going to your com- 
puter to work on the grant proposal. 

I should emphasize that these are supposed to be partial explanations. 
There are many mental states that are involved in performing akratic actions 
(Nanay 2020f) and I do not want to pretend that I can explain all of them. 
My aim is to highlight an important mental antecedent of akratic action that 
we may have more control over than other, less clearly understood motives 
of akratic actions. 

On a pragmatic note, it seems to follow from this that if you feel the 
temptation to reach for that remote control, then not imagining doing so (or 


1 Another relevant finding in this context is that congruent hand posture during motor imagery 
facilitates spinal excitability, whereas incongruent hand posture makes spinal excitability less likely 
(Vargas et al. 2004). 
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imagining performing other actions) may help you to resist this temptation, 
whereas imagining doing so will increase the probability that you succumb to 
the temptation. 

And here we can plug in one of the most celebrated results of sport psy- 
chology about motor imagery. It has been found that the precision and even 
the strength of complex motor actions is increased merely by the subject 
looking at the object these actions are performed with or on. The explanation 
of this is that the mere perception of this object triggers motor imagery and 
this repeatedly triggered motor imagery contributes to the better (more accu- 
rate, more forceful) performance of this action (Feltz and Landers 1983; 
Bakker et al. 1996). 

What is relevant from these findings for our purposes is that merely per- 
ceiving an object, with which we are used to performing an action, triggers 
motor imagery of this action. So, seeing a remote control will trigger motor 
imagery of grasping it and pushing the on button. And merely seeing a glass 
of wine will induce motor imagery of lifting it up and taking a sip. 

So one, simple and not always available, way of reducing the chance of per- 
forming an action we do not want to perform is to make the objects that are 
required for performing this action perceptually unavailable (that is, to hide 
that remote or not to have Facebook open in your browser, for example). Or, 
if this is not an option, the same can be achieved by making these objects 
inaccessible by a well-trained motor routine. If we don’t perceive this object, 
the motor imagery is less likely to be activated. And if we do perceive it, but 
the motor routine is not well-trained, the motor imagery is, again, less likely 
to be activated. 

This proposal could also be taken to be continuous with some influential 
philosophical accounts of resisting temptation (that is, resisting the initiation 
of the tempting action). Richard Holton argues that it can be detrimental to 
our determination to resist temptation to think about the tempting action 
(Holton 2009, pp. 126ff.). The present proposal could be thought of as extend- 
ing this general approach. Rather than focusing on thinking about the tempting 
action (whatever that means), the aim here is to identify just what kind of 
mental processes would be needed to push us over the threshold of action 
initiation. And my answer is that this mental process is motor imagery. This 
also explains what could be dubbed, for the benefit of Fawlty Towers fans, the 
“Don't mention the war” phenomenon: often focusing on not doing something 
leads to the performance of the very action we are trying hard not to perform. 

One advantage of this view of the role of motor imagery in action initiation 
is that it can help us to explain some empirical findings about addiction 
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treatment. The study I want to focus on is about alcoholics who were trained 
to use a joystick when presented with pictures of alcohol and of non-alcoholic 
beverages (Wiers et al. 2011; see also Palfai 2006; Wiers et al. 2010). Subjects 
in this experiment had to move the joystick away from themselves when pre- 
sented with pictures of alcohol and they had to move the joystick towards 
themselves when they saw a picture of non-alcoholic beverages. Subjects in 
the control group were either not trained in any ways, or were trained to 
respond to some other, not alcohol-related feature of the picture. 

The result was that those who were trained to make avoidance movements 
in response to pictures of alcohol showed significantly more progress at recov- 
ery (Wiers et al. 2011). In some cases, even a single training session had a sig- 
nificant positive effect (see esp. Wiers et al. 2010). It is not clear how we can 
explain this effect—it was not clear to the experimenters who conducted these 
studies either. Wiers et al. (2011) hypothesizes, very tentatively, that maybe 
emotions are involved (roughly, retraining the action tendencies lead to emo- 
tional change). But it is not clear how this connection would work and how 
such a change in emotions would lead to such rapid improvement in recovery. 

If we accept that motor imagery plays an important role in action initia- 
tion, we get a much more straightforward explanation. As we have seen, 
incongruent motor imagery interferes with action execution (Ramsey et al. 
2010). And the joystick exercise these subjects perform trains them to have 
motor imagery in response to pictures of alcohol that is incongruent with 
approach behavior. As a result, their action execution (of reaching for alcohol 
in relapse situations) is less likely to be triggered (see also the alternative 
explanation, anticipated in Wiers et al. 2011 and elaborated in Mylopoulos 
and Pacherie 2020, in terms of approach bias, which may be complimentary 
of my explanation). 

This is a very promising way of treating addictions. One important marker 
of addiction is that addicts’ attention is captured by addiction-relevant stimuli 
(see Brevers et al. 2011 for a summary of the vast literature on this; see also 
Anderson and Yantis 2013 for how this fits into long-term value-driven atten- 
tional effects). And the term “addiction-relevant stimuli” here does not merely 
mean stimuli that is directly connected to the addiction (in the case of gambling 
addiction: the roulette table), but a much wider range of stimuli that would 
be somehow very distantly related to the addictive behavior (for example, the 
shirt you once wore in the casino, and so on). 

It is not an option to hide all possible addiction-relevant stimuli (because 
they are everywhere). So addicts perceptually encounter addiction-relevant 
stimuli all the time and their attention is captured by these stimuli. And the 
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intense capture of the addict’s attention makes the triggering of motor 
imagery also more intense. So the only available option seems to be to repro- 
gram the motor imagery itself, which, as we have seen, is not an impossi- 
ble task. 

In short, perception can lead to unwanted motor imagery and unwanted 
motor imagery can lead to unwanted action. We can interfere with this proc- 
ess at various points: we can manipulate what we see, we can manipulate what 
kind of motor imagery perception gives rise to, and we can also manipulate 
what kind of action motor imagery contributes to. Understanding the role of 
motor imagery in action initiation can help us to manage the triggering of our 
unwanted (akratic, relapse) actions more efficiently. 

The picture of action initiation I outlined differs from the mainstream phil- 
osophical accounts of action in one important respect. In these mainstream 
views, all the causally efficacious mental states are mental states we have 
access to: mental states that we are aware of, be them beliefs and desires 
(Davidson 1980) or intentions (Searle 1983; Bratman 1987; Mele 1992). 

But the worry would be that this is not so when it comes to motor imagery. 
So one could argue that motor imagery is something that we merely postulate 
theoretically in order to explain some odd phenomena—it could be thought 
to be a theoretical entity, opening the door for various versions of antirealism 
about theoretical entities. 

My response is threefold. First, we often are aware of motor imagery. As we 
have seen, motor imagery may or may not be conscious. If it is conscious, it 
can be subject to introspection. And this introspective access to our motor 
imagery (again, bracketing worries about just how reliable introspection is) 
could justify one’s beliefs about the expected success of the action to be per- 
formed. In this sense, motor imagery can not only be conscious, it could also 
have significant epistemic import. 

This response (that motor imagery may or may not be conscious) addresses 
a potential pushback, namely that philosophy of action should not take into 
consideration a merely subpersonal state that causally contributes to action 
execution (after all, there are many of these, along the motor nerve). The 
answer is that motor imagery is not a merely subpersonal state. Even if we 
accept the personal/subpersonal distinction as unproblematic (I myself don't 
think we should), a state that can become conscious, if attention is allocated 
to it, is not a subpersonal state. In other words, motor imagery, although it 
can be unconscious, is a bona fide mental state (an analogy: perceptual states 
can also be, and often are, unconscious; nonetheless, it would be odd to deny 
that perceptual states are bona fide mental states). 
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Second, I don’t see any problem with postulating mental states if the only 
way in which we can explain the agent’s complex behavior is by postulating 
these mental states. We have extremely rich and varied evidence that our 
introspective access to our own mind is limited and often systematically mis- 
leading. But then we should not expect that we are aware of all the crucial 
building blocks of the mind and of all the causal ingredients of action 
performance. 

Finally, the fact that some causally relevant components of action initiation 
are unconscious is not a bug, but a feature (Nanay 2014b). There are many 
actions where we are not aware of whatever moves us to act. Impulsive actions 
would constitute one kind of example. We just find ourselves acting—we have 
a sense of ownership of our action, but we do not have a sense of having initi- 
ated it. Akratic actions, as we have seen, would be another. 

But there are even more prosaic cases. You're lying in bed in the morning, 
having hit the snooze button three times already and you know you need to 
get up, but somehow you just don’t. And then all of a sudden, you find your- 
self getting up. You are not aware of the state that moved you to act. Here is a 
literary example by Robert Musil: 


I have never caught myself in the act of willing. It was always the case that I 
saw only the thought—for example when I’m lying on one side in bed: now 
you ought to turn yourself over. This thought goes marching on in a state of 
complete equality with a whole set of other ones: for example, your foot is 
starting to feel stiff, the pillow is getting hot, etc. It is still a proper act of 
reflection; but it is still far from breaking out into a deed. On the contrary, I 
confirm with a certain consternation that, despite these thoughts, I still hav- 
ent turned over. As I admonish myself that I ought to do so and see that this 
does not happen, something akin to depression takes possession of me, 
albeit a depression that is at once scornful and resigned. And then, all of a 
sudden, and always in an unguarded moment, I turn over. As I do so, the 
first thing that I am conscious of is the movement as it is actually being per- 
formed, and frequently a memory that this started out from some part of the 
body or other, from the feet, for example, that moved a little, or were uncon- 
sciously shifted, from where they had been lying, and that they then drew all 
the rest after them.” 


? Robert Musil: Diaries. New York: Basic Books, 1999, p. 101. See also James (1890) and Goldie 
(2004, pp. 97-8). 
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Many of our actions are like this. And we should not dismiss these cases as 
rare instances of unimportant actions. Some of our actions of great impor- 
tance are also like this: going in for that first kiss (assuming you don’t do it by 
counting to three), for example. 

Any philosophical account of action needs to take actions of this kind seri- 
ously. But if so, then we need to postulate a mental state that we do not have 
to be aware of. So we could turn the tables and argue that it is precisely those 
accounts of action, which do not posit causally efficacious mental states that 
we are not aware of, that are problematic. 

To go back to the structural analogy between mental imagery and motor 
imagery, one way of summarizing the philosophical upshot of the proposal 
outlined in this chapter is that just as understanding sensory imagery is a cru- 
cial part of understanding perception per se, understanding motor imagery is 
an equally crucial part of understanding action per se. Just as perception 
would be very different if mental imagery played no role in it (in amodal 
completion as well as multimodal perception), action would also be very dif- 
ferent if motor imagery played no role in it. Philosophy of action should take 
the concept of motor imagery seriously. And as motor imagery, as we have 
seen, involves sensory mental imagery, this provides yet another reason why 
mental imagery plays an important role in explaining actions. 
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Cognitive Dissonance 


Cognitive dissonance was originally defined as a “nonfitting relation between 
cognitions,” which gives rise to an unpleasant feeling (Festinger 1957, p. 3). 
And this unpleasant feeling leads to a change in one’s attitude. To update the 
terminology a bit, if you have two representations that clash with one another, 
this can lead to a negatively valenced state, which then, in turn, leads to a 
change in your attitude. 

Consider the following example: 


Subjects had to undergo some kind of initiation ritual when they joined a 
discussion group. One of these rituals involved enduring minor electric 
shocks. The more painful these shocks were, the more strongly the subjects 
felt towards the group (Gerard and Mathewson 1966; see also Aronson and 
Mills 1959; Ma et al. 2014). You suffer more to be part of a group and then 
you are more loyal to them. This can be explained as a way of dealing with 
cognitive dissonance: if the group you just joined is not great, then what did 
you endure the severe pain for? For nothing? For being part of a mediocre 
group? This is the conflict that creates an unpleasant feeling and you get rid of 
this unpleasant feeling by denying that the group is mediocre. If you suffered 
so much to join, then the group must be great. In fact, it must be fantastic. 


There are hundreds of cognitive dissonance experiments (for some classic 
studies, see, for example, Festinger et al. 1956; Brehm and Cohen 1962; 
Festinger and Maccoby 1964; see also Thibodeau and Aronson 1992 and 
Cooper 2007 for overviews). In the original formulation of cognitive disso- 
nance, the “nonfitting relation” stands between two different “cognitions” 
Thus the name, cognitive dissonance. But it is not clear what kind of mental 
states “cognitions” are meant to be. According to the dominant view in philo- 
sophical discussions of cognitive dissonance, these representations are beliefs. 
I will argue against this view and then outline an alternative involving mental 
imagery. And I will show that this alternative account has significant explana- 
tory benefits over the belief account. 
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According to the mainstream view of cognitive dissonance, especially in 
philosophy (Quilty-Dunn and Mandelbaum 2018, 2019; Mandelbaum 2019; 
Bendana and Mandelbaum 2021; Quilty-Dunn ms), but also in psychology 
(Aronson 1992), the conflicting representations are beliefs. A key component 
of this explanatory scheme is the “self-concept” (Aronson 1969, 1992). 

The “self-concept” is a set of beliefs that one holds about oneself. These 
beliefs are very deeply embedded in one’s cognitive economy. As Aronson 
writes, “at the very heart of dissonance theory, where it makes its clearest and 
neatest prediction, we are not dealing with any two cognitions; rather, we are 
usually dealing with the self-concept and cognitions about some behavior” 
(Aronson 1969, p. 27). 

This self-concept is a set of beliefs one holds about oneself. More specifi- 
cally, these beliefs are beliefs that one is competent (smart, not an idiot), 
beliefs that one is good (moral, not a jerk) and beliefs that one is stable (not 
changing their mind randomly). Following the philosophical terminology 
(Mandelbaum 2019; Bendana and Mandelbaum 2021; Quilty-Dunn ms), 
I will call beliefs of this kind core beliefs. Core beliefs are supposed to be the 
cognitive center of gravity in our mind. It is very difficult to change them 
(even if there is conflicting evidence) and they color all our cognitive processing 
(Mandelbaum 2019; Bendana and Mandelbaum 2021). 

If we accept the existence of core beliefs, then we can give an explanation of 
cognitive dissonance in terms of a logical contradiction between the beliefs 
involved. The general idea is that while logical contradiction between any two 
beliefs is not sufficient for cognitive dissonance, the logical contradiction 
between a core belief and another belief is indeed sufficient. And logical con- 
tradiction is also necessary, although this often involves not two but at least 
three beliefs. Here is an example of how this kind of explanation would go in 
the initiation experiment with regards to the underlying reasoning (I changed 
the wording slightly to fit the example in this chapter). 


I just underwent severe shocks to join a discussion group and this made me 
think that the group I joined is great. : 

(Premise 1) I put a lot of effort into joining the [discussion group] 

(Premise 2) Only an idiot would put a lot of effort into joining the [dis- 
cussion group] without liking the [discussion group] 

(Premise 3) I am not an idiot 

(Conclusion) I must like the [discussion group] (and the appeal of the group 
is raised) (Bendana and Mandelbaum 2021; see also Quilty-Dunn ms) 
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I will call this explanatory scheme the “cognitive dissonance syllogism” as, 
according to the proponents of the belief account, each time we undergo cog- 
nitive dissonance, we go through syllogistic reasoning of this kind (granted, 
this reasoning is mostly unconscious according to the belief account; see 
Lieberman et al. 2001; Quilty-Dunn ms). 

Premise 3 of the syllogism is the core belief, and this is the premise that 
remains unshaken, leading to a change in our attitude about the discussion 
group. And some of the empirical support the proponents of the belief 
account appeal to in defense of this cognitive dissonance syllogism is about 
the importance of core beliefs in this syllogism. More specifically, if the core 
belief is taken out or weakened (by giving the subjects bogus tests that suggest 
that they are worthless or stupid), this leads to the collapse or at least the 
weakening of the cognitive dissonance effect (Glass 1964; Stone and Cooper 
2003). Thus, the cognitive dissonance syllogism only works if the core beliefs 
are also part of it. 

While this is an extremely elegant way of making the original idea of cog- 
nitive dissonance more precise, I will argue that there are empirical problems 
with it. I will then go on to outline an alternative involving mental imagery, 
which fares better about these problems. 

The first empirical objection is about the location of the neural underpin- 
nings of cognitive dissonance. Given the complexity of cognitive dissonance, 
a number of different brain regions are involved in cognitive dissonance, 
especially ones that are associated with emotions. This would be consistent 
with the belief account. But more surprisingly, the activation of the primary 
visual cortex is very different in cognitive dissonance versus no dissonance 
conditions (keeping all other parameters fixed; see Izuma et al. 2010; de Vries 
et al. 2014). If cognitive dissonance is all about beliefs and the contradictions 
between them, this finding is very difficult to explain. 

The second empirical problem is that, in a much-hyped set of studies, it 
was found that music reduces cognitive dissonance (Masataka and Perlovsky 
2012; Perlovsky et al. 2013). Again, it is difficult to see how the belief account 
could explain this as music does not have any effect on our logical inferences. 

The third empirical problem is that we have substantial empirical evidence 
that some animals are also capable of cognitive dissonance (Harmon-Jones 
et al. 2017). But even if animals have beliefs (which would be difficult to deny, 
even though some philosophers have done so, see Davidson 1980), we have 
no evidence that they have anything reminiscent of a “self-concept” or core 
beliefs. Nor do we have any evidence that they are capable of the logical rea- 
soning that would be required by the cognitive dissonance syllogism. 
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These empirical problems with the belief account point in the direction of 
the alternative I want to offer to the belief account. The main idea is that the 
conflicting representations are not beliefs, but imagistic representations. 
Imagistic representations are mental representations that represent imagisti- 
cally (in an imagistic format, see Chapter 6). Mental imagery is imagistic rep- 
resentation, as is perception, but there may be other kinds of imagistic 
representations. And my claim is that the representations involved in cogni- 
tive dissonance are imagistic representations. I will argue that this way of 
understanding cognitive dissonance has significant explanatory benefits over 
the belief account. 

As we have seen in Chapter 23, mental imagery can be, and is very often, 
emotionally charged. If the representations that are involved in cognitive dis- 
sonance are emotionally charged, then we do not need any conflict between 
such representations in order to generate the emotional state that character- 
izes cognitive dissonance. In the original conceptual framework of cognitive 
dissonance, the conflict, or “nonfitting relation” between two representations 
yields a negative emotion. But as long as we allow for emotionally charged 
representations, we do not need to postulate two representations and a “non- 
fitting relation” between them in order to explain the emotional impact of 
cognitive dissonance. All we need is to postulate one emotionally charged 
imagistic representation. And, as we have seen in Chapter 23, we have plenty 
of independent reasons to posit such emotionally charged imagistic 
representations. 

We can now put together the alternative explanatory scheme of cognitive 
dissonance. As a result of the prompt or the experimental setup, you come to 
attend to a negatively valenced imagistic representation. Attending to this 
negatively valenced imagistic representation amounts to a negative emotional 
state. This negative emotion, in turn, makes you behave in a way that weakens 
this negative emotion—a simple move here would be to attend to some- 
thing else. 

And we know fairly well how this attending to something else works from 
research on “experiential avoidance.’ Experiential avoidance happens “when a 
person is unwilling to remain in contact with particular private experi- 
ences...and takes steps to alter the form or frequency of these experiences or 
the contexts that occasion them” (Hayes et al. 2004; see also Tolin et al. 1999; 
Wieser et al. 2009). And this is exactly what happens in the last step of cogni- 
tive dissonance resolution. 

Let us go through the initiation example I used earlier to introduce cogni- 
tive dissonance to demonstrate how this explanatory scheme works: 


220 MENTAL IMAGERY: PHILOSOPHY, PSYCHOLOGY, NEUROSCIENCE 


You undergo the electric shocks to join the boring discussion group and 
then you are asked whether you think the group is great. When you need to 
answer this question, you need to consider the two options: great or not so 
great. If you consider that the group is not great, this brings to mind an 
imagistic representation (a memory image) of you enduring the electric 
shocks, all in order to join a mediocre group. Not a pleasant image. So you 
divert your attention to the other option, namely, that the group you joined 
is great, and choose that option. 


This explanation is simpler than the ones given in the framework of the belief 
accounts. Again, the main difference is twofold: the nature of the representa- 
tions (imagistic vs. beliefs) and the origins of the negative emotional state 
(the emotionally charged imagistic representation itself versus some form of 
“nonfitting relation” between two beliefs). According to the imagistic repre- 
sentation account, the resulting attitude change is brought about by an atten- 
tional shift, roughly along the lines of experiential avoidance. 

I will now argue that the imagistic representation account can handle all 
the empirical objections raised against the belief account. I raised three 
empirical objections to the belief account. All of these could be taken to be 
empirical support for the imagistic representation account. 

First, given that mental imagery is known to activate the early visual corti- 
ces, including the primary visual cortex, the imagistic representation account 
predicts that the primary visual cortex behaves differently in cognitive disso- 
nance. And this is exactly what was found. Second, the imagistic representa- 
tion account can also explain why music interferes with cognitive dissonance, 
as both music processing and mental imagery compete for the same early 
cortical resources. 

Third, how could the imagistic representation account accommodate cog- 
nitive dissonance reduction in animals? We have seen that this is difficult to 
make sense of within the framework of the belief account. Even if we can 
attribute beliefs to animals, we have no independent evidence that would jus- 
tify attributing complex inferential apparatus of the like that is required by the 
cognitive dissonance syllogism. There is no indication of anything reminis- 
cent of core beliefs either. On the other hand, we have independent evidence 
that animals have emotionally charged imagistic representations (see Kremer 
et al. 2020 for a summary; see also Nanay 2020c). The ability to explain cogni- 
tive dissonance among non-human animals is a major strike in favor of the 
imagistic cognition account (and against the belief account). 
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Finally, after having railed against the belief account, I should end on a 
pluralistic note. Cognitive dissonance is a diverse phenomenon and it is far 
from clear that one and only one explanatory scheme can explain all of its 
very different instances. So, it is very much possible that some instances of 
cognitive dissonance can be explained by the belief account. But not all of 
them can be. And thinking of cognitive dissonance as a reaction to emotion- 
ally charged imagistic representations could explain many central cases. 
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Implicit Bias 


Some of our behavior is biased. By this I mean that the behavior goes against 
our reported beliefs. And often we are not fully aware of these biases. The 
question I want to raise in this chapter is about the mental representation that 
is responsible for this biased behavior. This representation mediates between 
the trigger and the biased behavior. And my claim is that this representation 
is neither a propositional attitude nor a mere association (as the two major 
accounts of implicit bias would claim). It is mental imagery (see Nanay 2021b 
for a more elaborated argument for this claim). 

I am interested in the representation that mediates between the trigger 
and the biased behavior, which I will refer to as “biasing representation? But 
this picks out a lot of things. When people are reluctant to throw darts at the 
picture of the face of a loved one or to drink lemonade from a sterilized 
bedpan (Rozin et al. 1986; see also Gendler 2008), we also get behavior that 
goes against our beliefs (which represent these actions as harmless). The 
term “implicit bias” is more often used to describe a more specific phenome- 
non: representation about certain racial and gender groups (as well as other 
social groups) that influences our behavior in such a way that it goes against 
our beliefs (Greenwald et al. 1998, Greenwald and Nosek 2009; Dunham 
et al. 2008). 

Here is an example. In an elevator you might stand a little bit farther away 
from people whose skin color is different from yours. You may or may not be 
aware of this. But you think of yourself as someone who does not make dis- 
tinctions between people because of their skin color. So your behavior is 
biased in the sense that it goes against your belief. And the mental representa- 
tion (presumably about people with various skin colors) that is responsible 
for this behavior is what this chapter is about. 

Implicit bias is a genuinely heterogenic phenomenon (see Holroyd and 
Sweetman 2016 for a detailed taxonomy; and Johnson 2020 for even more 
heterogeneity). The two main candidates for biasing representations in the lit- 
erature are the following. This biasing representation might be an association— 
between a specific skin color and a specific trait, say, being dangerous. Or this 
biasing representation might be a propositional attitude—an attitude towards 
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the proposition that people with this specific skin color tend to be dangerous. 
My aim is to carve out and defend a third option, according to which this 
representation is mental imagery: perceptual processing that is not triggered 
directly by sensory input. I will argue that this view captures the advantages of 
the two standard accounts without inheriting their disadvantages. 

In short, empirical findings show that the biasing representation would 
need to be both sensitive to semantic content and insensitive to logical form. 
But associations are not sensitive to semantic content. And propositional atti- 
tudes are not insensitive to logical form. I will argue that mental imagery is 
much better suited to fulfill the theoretical role of the biasing representation: 
it is sensitive to semantic content (unlike associations) and insensitive to logi- 
cal form (unlike propositional attitudes). My aim is not to completely dismiss 
the associationist or the propositionalist account of implicit bias (or both), 
but rather to reframe the debate by adding an extra important ingredient to 
any explanation of implicit bias (be they associationist, propositionalist): 
mental imagery. 

Some clarifications: there is emerging evidence that implicit bias may not 
be unconscious: we may be more aware of our “implicit” attitudes than the 
initial bias findings might suggest (Nier 2001; Ranganath et al. 2008; Hahn 
et al. 2014; Machery 2016; Toribio 2018; Berger 2019). Nothing I say in this 
chapter takes sides on this issue. Further, there have been some debates about 
just what the most highly publicized indicator of implicit bias, the online 
freely available Implicit Association Test, shows or does not show (Forscher 
et al. 2016). As I take the Implicit Association Test to be only one of many 
experimental procedures that aim to demonstrate implicit bias, I will set this 
debate aside. If the reader is skeptical of the Implicit Association Test, this is 
not a reason to be skeptical of the broader phenomenon of biased behavior 
in general. 

Let’s suppose that your implicit bias makes it more likely that when you 
think of a caregiver, you think of a woman. This is, in fact, a very widespread 
example of implicit bias. The question is: what is the underlying biasing rep- 
resentation? The classic candidate is association. You have probably seen 
more female caregivers than male caregivers. And, following the mechanism 
of classic conditioning, you formed an association between being a care- 
giver and being a woman. One way to think about associations is as some 
kind of connection strength in your mind between the concept of being a 
caregiver and the concept of being a woman. When one concept is activated, 
the other one is very likely to be also activated. So when you hear someone 
talk about a caregiver, this gives rise to you thinking of a woman. Association 
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is supposed to be quick, not under our voluntary control and, according to 
many (see Mandelbaum 2016 for a summary), symmetrical (it goes both from 
caregiver to woman and vice versa). 

The alternative view is that the underlying biasing representation is a prop- 
ositional attitude—typically a belief (Levy 2015; Mandelbaum 2016—some 
psychologists also often talk about propositions in this context, although what 
they mean by this tends to be something very different, see De Houwer et al. 
2001; De Houwer 2009, 2011, 2019).’ So you have a propositional attitude 
that caregivers are (likely to be) women. And it is this propositional attitude 
that explains your biased behavior. In the case of propositional attitudes, the 
relation between being a caregiver and being a woman is not symmetrical. 
The propositional attitude that caregivers are (or tend to be) women is different 
from the propositional attitude that women are (or tend to be) caregivers (see 
esp. Mandelbaum 2016). 

I will argue that the biasing representation is neither an association nor a 
propositional attitude: it is mental imagery. 

On the one hand, implicit bias is sensitive to the semantic content of our 
representations (Mandelbaum 2013, 2016; see also Nanay 2021b). It is sensi- 
tive to the content of representations other than the biasing representation. 
The content of the biasing representation is combined with the content of 
other representations, which then produces the biased behavior (Rozin et al. 
1990; Sechrist and Stangor 2001; Gawronski et al. 2005; Newman et al. 2011; 
Cone and Ferguson 2015). Here is one example: People generally pay more 
money for clothes worn by celebrities than for identical clothes that are fresh 
off the shelf. But experiments show that they pay a little less if these clothes 
were washed since the celebrity wore them (Newman et al. 2011). This is 
biased behavior. We have two representations here. The first one (A) is the 
biasing representation about the connection between the clothes and the fact 
that they were worn by a celebrity. And this explains why we pay a lot of 
money for clothes worn by celebrities. But we also have another representa- 
tion (B) about the connection between clothes and washing. And our biased 
behavior that we pay less money for washed celebrity clothes than for 
unwashed celebrity clothes, is explained by the interaction between the 


1 What De Houwer means by proposition is very different from what philosophers mean by propo- 
sitional attitudes. Proposition, for De Houwer, means any representation that represents a relation 
(De Houwer 2009, 2011). Needless to say, many kinds of representations would qualify according to 
this definition, besides propositional attitudes. Some perceptual states would count, as would mental 
imagery. 
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biasing representation (A) and the representation about washing (B). In short, 
biasing representations can enter into content-sensitive transitions. 

On the other hand, implicit bias is not sensitive to logical form (Madva 
2016; see also Nanay 2021b). So, exposure to sentences like “Tt is not true that 
old people are bad drivers” strengthens implicit bias about old people's bad 
driving as much as exposure to sentences like “Old people drive badly” does 
(Gawronski and LeBel 2008; Deutsch and Strack 2010; Deutsch et al. 2009). 

In light of these findings, both of the classic accounts of implicit bias are 
problematic. The biasing representation would need to be both sensitive to 
semantic content and insensitive to logical form. But associations are not sen- 
sitive to semantic content. And propositional attitudes are not (normally) 
insensitive to logical form. I will argue that mental imagery is much better 
suited to fulfill the theoretical role of the biasing representation: it is sensitive 
to semantic content (unlike associations) and insensitive to logical form 
(unlike propositions). 

My claim is that the biasing representation is mental imagery. This mental 
imagery is often emotionally charged, often action-guiding and often uncon- 
scious. It is always involuntary. 

Given that inference is a relation between propositional attitudes, and 
mental imagery is not a propositional attitude (as it has imagistic content), 
mental imagery does not enter into inferences. But not all content-sensitive 
transitions between mental states are inferences and there can be content- 
sensitive transitions between mental states with imagistic content. 

Mental imagery can lead to, and even justify, various other mental pro- 
cesses in a content-sensitive manner. Here are two examples: First, you are 
trying to wrap a box in gift wrap (see Chapter 24). You look at the box, you 
look at the gift wrap and how big a piece you tear off depends (in a content- 
sensitive manner) on your exercise of mental imagery. Second, you are play- 
ing snooker or billiards and you need to sink a ball at the other end of the 
table, with a lot of other balls in the way, but you figure out a way of making 
the cue ball ricochet twice before hitting the ball exactly from the right angle. 
In both of these two examples, there are content-sensitive transitions between 
mental imagery and other mental processes, but neither of them is an infer- 
ence: these transitions are not mediated by beliefs or other propositional 
attitudes. And neither of them is an association (cf. Mandelbaum and Quilty- 
Dunn 2019). 

Taking the biasing representation to be mental imagery can explain many 
examples of biased behavior. Take the example I started the chapter with, 
when subjects are reluctant to drink lemonade from a sterilized bedpan. The 
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perceptual state of seeing the yellow liquid in the bedpan-shaped drinking 
vessel triggers the mental imagery of urine and it is this biasing representa- 
tion that explains our reluctance. The imagery itself does not have to be con- 
scious (although it might be). It is emotionally charged (presumably the 
emotion is disgust). And it is action-guiding in the sense that it interferes 
with our action. 

Here is another example (from Gendler 2008). The chef reorganizes her 
kitchen. The cleaver used to be above the dishwasher, but it is now next to the 
stove. She knows this—she placed the cleaver from here to there herself. But in 
the rush of preparing a meal, she still reaches to where the cleaver used to be— 
above the dishwasher. This is biased behavior: it goes against her beliefs. And 
we can explain this in terms of the mental imagery that she has of the cleaver 
above the dishwasher. Again, this mental imagery does not have to be con- 
scious. But, in this example, it is very much action-guiding mental imagery—it 
guides the chef’s action the same way as the mental imagery of the light switch 
guides your action in your pitch-dark bedroom (see Chapter 26). 

Could we explain the chef’s behavior in terms of an unconscious belief? 
No. We know from a vast amount of studies in the neuroscience of “atten- 
tional templates” that visual actions of the kind performed by the chef amount 
to early cortical processing in the visual system that is not triggered by sen- 
sory stimulation—in short, to mental imagery (Keogh and Pearson 2021; see 
also Chapter 10). The belief view does not explain this. The mental imagery 
view does. 

And the same explanatory scheme also applies to biased behavior concern- 
ing other racial and gender groups. Subjects (who are not black) are more 
likely to misperceive a tool as a gun if a black person holds it than if a white 
person does so (Payne 2001; see also Siegel 2020). Here, again, the perceptual 
state of a black person holding a wrench gives rise to the mental imagery of a 
black person holding a gun. Again, this mental imagery does not have to be 
conscious—and when white people rate black people as more dangerous, it is 
possible that the mental imagery that grounds these judgments is not con- 
scious. The same is true of the biased behavior of standing further away from 
some people than others in the elevator. 

It is easy to see that mental imagery fits the profile of biasing representation 
we identified above. Given that implicit bias is insensitive to logical form, we 
need biasing representations that are also insensitive to logical form. And 
mental imagery is a good candidate for this as well. And as implicit bias 
clearly depends on our background beliefs and other mental states, this rules 
out associationism. But it does not rule out the view according to which the 
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biasing representation is mental imagery, as mental imagery can also be com- 
bined with beliefs. 

Mental imagery is very much sensitive to semantic content and especially 
top-down influences. If I visualize a cat, the features of the cat I visualize very 
much depend on what kinds of cats I have seen in my life and also on my 
background beliefs and knowledge about cats. The chef’s mental imagery of 
the cleaver depends on previously stored memories. And we have plenty of 
empirical evidence about how mental imagery can be influenced in a top- 
down manner (see Chapter 11). 

It is important to note that this dependence of mental imagery on top- 
down information does not mean that the content of mental imagery is con- 
ceptual content or that a concept is attached to the mental imagery. Mental 
imagery is perceptual processing, which may or may not be influenced in a 
top-down manner. But even if it is, this does not entail that its content is con- 
ceptual. To take the example of the bedpan, the mental imagery of the yellow 
liquid, which is triggered by the perception of the bedpan, does not have to 
have concepts like “urine” attached to it. It is enough if the valence of the 
mental imagery is influenced in a top-down manner (by its association with 
urine). Mental imagery is influenced by our concepts—it does not have to be 
conceptual. 

Finally, a somewhat odd feature of the implicit bias literature in philosophy 
is that it often focuses on the relation between concepts: the concept of being 
a homemaker and the concept of being a woman, for example (see Del Pinal 
and Spaulding 2018 for discussion). But this focus is in tension with the vast 
majority of empirical findings about implicit bias, where the trigger of biased 
behavior is a perceptual state. And the biasing representation is the represen- 
tation that mediates between the perceptual trigger and the biased behavior. 
My account has a wider range of explanations of the perceptual (or quasi- 
perceptual, if you wish) nature of this mediating biasing representation than 
the classic views. The classic views would be committed to saying that the 
perceptual state triggers a concept, and that concept is either associated with, 
or is propositionally related to, another concept and this other concept is 
responsible for the behavior. We have both empirical and conceptual reasons 
to think that perception is not always linked to actions by means of concepts 
(see Jeannerod 1997 and Nanay 2013a for summaries). If so, then my account 
can explain the mediation between perception and action without any direct 
appeal to concepts. 

One may wonder whether the mental imagery account is a genuine alter- 
native to the associationist and propositionalist account. After all, the trigger 
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somehow leads to mental imagery and the mental imagery somehow leads to 
action. So, one could argue that even if we accept the mental imagery account, 
the nature of these transitions is still an open question, and these transitions 
can be explained either in an associationist or in a propositionalist manner. 
This is a fair point, but it should also be pointed out that the question about 
how a perceptual state gives rise to mental imagery and then how mental 
imagery turns into behavior is a very different question from the one about 
what kind of connection between concepts leads to biased behavior. So even 
if the associationism versus propositionalism debate is not put to bed entirely, 
we have made some progress. 

Take microbehavior, for example. One striking phenomenon often dis- 
cussed under the heading of implicit bias is how our biasing representation 
influences our microbehavior (Chen and Bargh, 1997; Bessenoff and 
Sherman, 2000; Wilson et al. 2000; McConnell and Leibold, 2001; see also 
Levy 2015 for a summary)—not what answer we choose in a questionnaire, 
but, for example, how much we look in the eyes of people with different skin 
color, how far away we stand from them in an elevator and so on. And here 
the mental imagery account has a real advantage. If the biasing representation 
is (pragmatic, action-guiding; see Chapter 26) mental imagery, then we have a 
direct and straightforward way of explaining how little differences in one’s 
(action-guiding) mental imagery are responsible for little differences in one’s 
behavior. 

There are some structurally similar accounts in the implicit bias literature, 
and I need to contrast my account with them. I will focus on two of these, 
Neil Levy’s patchy endorsement account (Levy 2015) and Ema Sullivan- 
Bissett’s unconscious imagination account (Sullivan-Bissett 2019). 

The general moral is that both Levy’s concept of patchy endorsement and 
Sullivan-Bissett’s concept of unconscious imagination are far into the propo- 
sitionalist side of the divide. Levy explicitly claims that the biasing representa- 
tions are “patchy endorsements,’ which “have some propositional structure” 
(Levy 2015, p. 816) and that they “feature in some inferences” (Levy 2015, p. 816). 
Mental imagery does not have propositional structure (not even some!) and it 
does not feature in inferences. 

Similarly, Sullivan-Bissett’s view, according to which the biasing represen- 
tation is unconscious imagination, is more permissive towards the proposi- 
tionalist than mine as she would allow at least some of the unconscious 
imaginative episodes to be propositional attitudes. Imagination is not the 
same as mental imagery—imagination is a very specific exercise of mental 
imagery. And while there are good empirical and theoretical reasons to think 
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that mental imagery can be, and often is, unconscious, it is much more con- 
troversial to say that imagination can be unconscious (see Kind 2001 for a 
classic argument against this view, but see also Brogaard and Gatzia 2017, 
Church 2008, as well as Chapter 22). 

The question about what these biasing representations are is crucial not 
just out of theoretical interest. If we want to try to eliminate implicit bias, very 
different procedures would be needed depending on what these biasing rep- 
resentations are. Further, as we will see in Chapter 30, some of the most effi- 
cient ways of counteracting implicit bias involves mental imagery. This gives 
us yet another strong reason in favor of the view that the biasing representa- 
tion of implicit bias is mental imagery. 


30 
Clinical Applications of Mental Imagery 


Mental imagery is a crucial ingredient of a wide variety of mental phenomena. 
Understanding the role of mental imagery in these mental processes can also 
help us to do something about the malfunctioning of these processes. This 
chapter is about the use of mental imagery in the service of treating various 
negative conditions in clinical practice. 

An important development in various branches of psychiatry is to manipu- 
late the mental imagery of patients, in order to improve their condition, by 
means of techniques such as “imaginal exposure,” “systematic desensitiza- 
tion,’ and “imagery rescripting.” There are reports of the success of this meth- 
odology in the case of mental disorders ranging from bipolar disorders, 
schizophrenia, and post-traumatic stress disorder to obsessive compulsive 
disorder and depression (Holmes et al. 2010; James et al. 2015; Murphy et al. 
2015; Clark et al. 2016; Slofstra et al. 2016; see Pearson et al. 2015 for a 
summary). 

Take post-traumatic stress disorder, as an example. The main symptom of 
post-traumatic stress disorder is the recurring and involuntary negative mental 
imagery of the traumatic event. When soldiers come back home after serving 
in war zones, for example, vivid and extremely negative mental imagery 
(in various sense modalities) is often triggered by various sensory stimuli 
(proverbially by fireworks, but also, for example, the smell of a barbecue) (see, 
for example, Clark and Mackay 2015). 

Another example is depression, where one of the most important indica- 
tions of the level of depression is the lack of future positive mental imagery 
(see Ji et al. 2017 for a summary). Schizophrenia has been shown to be associ- 
ated with more vivid negative mental imagery, which is responsible for 
changes in perception (Maróthi and Kéri 2018). And mental imagery has also 
been taken to be a central component of (especially non-restrictive) eating 
disorders as well as many forms of addiction (Sommerville et al. 2007; Kadriu 
et al. 2019). Finally, negative imagery plays a crucial role in anxiety disorders 
(Hirsch and Holmes 2007). 

Given the centrality of mental imagery in these conditions, a number of 
techniques have been developed to change the subjects’ negative mental 
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imagery. One way of doing so is by weakening the subjects’ mental imagery 
(by making them perform tasks that compete for the same mental resources). 
A famous example of this is to have subjects with recent’ trauma play Tetris, a 
mental imagery-involving game, which competes with the traumatic mental 
imagery, thereby preventing the traumatic event from being consolidated in 
memory (Holmes et al. 2010). 

A related, and much hyped technique is “eye movement desensitization 
and reprocessing,’ used to treat anxiety as well as post-traumatic stress disor- 
der and addiction. The subjects have to recall the traumatic or anxiety- 
producing event while performing various directed eye movement exercises 
(see van den Hout et al. 2013 for details). While “eye movement desensitiza- 
tion and reprocessing” is not, on the face of it, an imagery treatment, it can be 
straightforwardly explained in terms of the role of mental imagery in epi- 
sodic memory. 

As we know from a number of studies (see Chapter 3), mental imagery is a 
central ingredient of episodic memory. We also know that episodic memory 
is constructive (see Chapter 21): remembering an event we have experienced 
is not the mere accessing of the memory trace, but the active construction of a 
memory. Remembering an event changes the way it is encoded (so next time 
you will remember it differently). Finally, we have also seen the role of eye 
movements in mental imagery (see Chapter 7). If we put these three pieces 
together, this explains why “eye movement desensitization and reprocessing” 
works the way it does. 

When the subject recalls the traumatic event while moving their eyes, this 
reduces the vividness of the mental imagery involved in that episodic mem- 
ory, given that incongruous eye movement reduces the vividness of mental 
imagery. And as the very act of remembering an event changes its representa- 
tion in memory, when this memory gets re-encoded, it will be encoded in a 
less vivid manner. 

Further techniques involving imagery consist of changing one’s mental 
imagery in various ways, for example by questioning its validity/reality, or by 
re-describing it in different terms, or just by manipulating it, the way one can 
imagine a cupcake with a cherry on the top and then replace, in imagination, 
the cherry with a raspberry. 

In the treatment of some (non-restrictive) eating disorders, the desirability 
of food items can be regulated with the help of voluntary mental imagery of 


* “Recent” here means something that has happened less than six hours ago, which is widely held 
to be the window of memory consolidation. 
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unrelated food items. For example, imagining the taste and smell of desirable 
food makes the subjects choose a smaller portion of another food item that 
they also like (and that is in front of them; see Cornil and Chandon 2016; see 
also Chapter 25; Harvey et al. 2005; Andrade et al. 2012 on the motivating 
role of imagery in craving). 

Finally, imaginal exposure means that the usually involuntary imagery 
(in post-traumatic stress disorders, phobias, or panic attacks) is conjured up 
voluntarily. For example, phobia of spiders can be treated very quickly and 
efficiently by imaginal exposure alone—even one session of 10 minutes of 
voluntarily conjuring up mental imagery of spiders leads to a much less 
intense emotional reaction to spider-related stimuli one week later (Hoppe 
et al. 2021). 

It is important that almost all the clinical procedures described so far rely 
on the manipulation of the conscious and voluntary mental imagery of the 
subjects: they ask the patients to visualize a certain event. But as we have seen, 
this is just one way of triggering mental imagery and one that is dependent on 
a lot of factors that might prevent the patient from succeeding in visualizing 
what she is asked to visualize. Voluntary mental imagery is hard to maintain 
and even harder to control. If the experimenter asks me to visualize a spider, 
I may or may not do it. And even if I do visualize it, I may just have a quick 
flash of a spider and then I just think of something else instead. 

Inducing involuntary mental imagery, on the other hand, could bypass 
these blocks and it could provide a more efficient way of interfering with the 
patients’ mental imagery. This is what happens in the Tetris experiment, for 
example, which is one of the most successful and widely replicated uses of 
mental imagery in clinical practice. But, given that mental imagery can also 
be automatically induced by means of crossmodal activation, there are many 
more options of bypassing the worries about voluntary mental imagery. 

The role involuntary imagery can play is especially promising in the light of 
findings according to which voluntary visualization suppresses non-visual 
sensory activation (Amedi et al. 2005). So, voluntarily visualizing an apple 
prevents activations in the olfactory or gustatory sense modalities. Given the 
multimodal nature of perceptual episodes, this makes voluntary visualizing 
much more impoverished than perceiving when it comes to non-visual sense 
modalities and this, in turn, can weaken the effect of imagery treatment. 


? While these techniques work very well on average, there is great variability of the efficiency of this 
method between subjects (see, for example, Williams et al. 2013; Blackwell et al. 2013). 
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Inducing mental imagery crossmodally (hence, involuntarily) obviously does 
not have these undesired consequences. 

Here is a demonstration of how involuntary mental imagery can be used in 
a much more efficient manner than voluntary imagery. The most widespread 
cause of tinnitus is the malfunctioning or reorganization of the auditory cor- 
tex (Muhlnickel et al. 1998—I set aside some rarer forms of tinnitus, like 
“objective tinnitus”). And one of the quickest ways of treating tinnitus is by 
playing music with the tinnitus tone and its octaves blocked out. So, if the 
tinnitus is a C# tone, then the subjects are made to listen to tunes they know 
well, but each time a C# note would come up, there is just silence (Okamoto 
et al. 2010). This is clearly a case of auditory mental imagery. The subjects 
have auditory imagery of the C# tone, and this auditory imagery is involun- 
tarily triggered (by the surrounding non-C# notes). And this involuntarily 
triggered auditory imagery of the C# tone weakens the tinnitus already after a 
relatively short period of time. Voluntarily forming auditory imagery of the 
C# tone has no such effect. 

With this understanding of clinical uses of mental imagery, I want to go 
back to two important imagery phenomena I discussed earlier in the book 
and draw some conclusions about what we can do about them. I start with the 
continuation of the argument in Chapter 29 concerning implicit bias. While 
implicit bias is not a clinical condition and counteracting implicit bias is not a 
clinical procedure, understanding the role of mental imagery in clinical con- 
texts can help us understand not only how to and how not to counteract 
implicit bias, but also what implicit bias is. 

Among the most efficient ways of manipulating implicit bias, we find many 
techniques that manipulate mental imagery. For example, visualizing or put- 
ting ourselves imaginatively in the shoes of a member of another racial or 
gender group can reduce implicit bias significantly. And, crucially, the extent 
of this reduction correlates with the details and vividness of the imagery 
involved (Blair et al. 2001; Blair 2002; Lai et al. 2014; see also Peck et al. 2013 
for further relevant findings and Markland et al. 2015 for the impact of men- 
tal imagery on implicit preferences more generally). 

But why would evoking mental imagery (in this unreliable way, which is 
difficult to control and maintain) be an efficient way of counteracting implicit 
bias? It is difficult for the propositionalist to explain this: if the biasing repre- 
sentation is a propositional attitude, then propositionalists would need to 
explain why mental imagery—a perceptual process—has a direct impact on it 
(while mental imagery does not routinely justify beliefs, as we have seen in 
Chapter 24). Even more importantly, they would also need to explain why 
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mental imagery has more impact on it than other perceptual processes (like 
actual stimulation-driven perception). 

Similarly, according to associationism, the best way to unlearn an associa- 
tion is extinction and extinction is achieved by repeated exposure to percep- 
tual stimuli that goes against the association (if the association is between A 
and B, then the extinction would involve exposure to A and non-B or non-A 
and B). But the research I cited shows that manipulating mental imagery is a 
more efficient way of counteracting implicit bias than extinction and it is 
unclear how the associationist could explain this. 

The mental imagery view is, obviously, well-suited to explain this—if the 
biasing representation is mental imagery, then it should not come as a sur- 
prise that the bias can be reverted by manipulating the subjects mental 
imagery. 

It is important that this procedure relies fully on the manipulation of the 
conscious and voluntary mental imagery of the subjects: they ask the patients 
to visualize certain faces, or imagine themselves to be certain people con- 
sciously and voluntarily. But as we have seen, this is just one way of triggering 
mental imagery and one that is dependent on a lot of factors that might pre- 
vent the subject from succeeding in visualizing what she is asked to visualize. 
It is also difficult to control whether the subject does in fact visualize the out- 
group face she is asked to visualize. Finally, such visual imagery is difficult to 
maintain for longer than a couple of seconds. Nonetheless, in spite of all these 
practical problems, the imagery-involving procedure is among the most effi- 
cient ways of reducing implicit bias. Using the considerations in favor of 
involuntary and crossmodally triggered mental imagery can help us consider- 
ably in counteracting implicit bias more efficiently. 

Yet another phenomenon where the clinical uses of mental imagery could 
be of help is pain (see Chapter 17). More and more empirical research has 
been focusing on the role of mental imagery in pain treatment. One of the 
most promising trends, both in the neuroscience of pain and in psychiatric 
treatments of chronic pain, is the focus on mental imagery. Many patients 
with chronic pain report involuntary mental imagery connected with the 
pain and some of them also report developing coping mental imagery 
(Winterowd et al. 2003; Berna et al. 2012; Gosden et al. 2013). Finally, one of 
the most efficient ways of treating chronic pain is to alter the mental imagery 
of patients (Moseley 2004, 2006; MacIver et al. 2008; Philips 2011; Fardo et al. 
2015; Volz et al. 2015). 

Here is one illustrative example. Berna et al. (2011) and Berna et al. (2012) 
give the case study of a 47-year-old woman with chronic pelvic pain, who had 
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recurrent spontaneous mental imagery of a burning hole at the locus of the 
pain. This intrusive spontaneous mental imagery was not detachable for her 
from the pain itself. She also developed coping imagery of a hot water bottle 
applied on the locus of the pain, which helped her a great deal. This is not an 
isolated example (see MacIver et al. 2008; Fardo et al. 2015; and Volz et al. 
2015 for very similar case studies). 

This coping imagery (like the intrusive recurrent mental imagery of a 
burning hole at the locus of the pain) is often involuntary. But modifying this 
involuntary coping imagery (and making it compete with the intrusive 
imagery more efficiently) can help the patient even more (see Berna et al. 
2012). Coping imagery has also been extensively used for preparing patients 
before surgery (Tusek et al. 1997). All these findings about the importance of 
imagery in pain treatment lend further support to the claim I made in 
Chapter 17 about the central role of imagery in pain perception. 

A lot has been said about using imagery to promote mental health in self- 
help circles (see, for example, Rossman 2000). My dentist told me last week to 
imagine being on the beach listening to the waves crashing. It didn’t help. So 
there would be good reason for skepticism about the health benefits of 
imagery. But part of the practical benefits of the book is that by clarifying 
what role mental imagery plays in our mental life, it makes it easier to find out 
how and why mental imagery can help us in clinical contexts. 


PART VI 
APPENDIX 


31 
Mental Imagery in Art 


The importance of mental imagery can be traced beyond the confines of 
neuroscience, psychology and philosophy of mind.* To show the reach of the 
concept, I want to explore the role mental imagery plays in a philosophical 
subdiscipline that, at first glance, may seem as far removed from neurosci- 
ence, psychology, and philosophy of mind as possible: aesthetics. 

Mental imagery plays an important role in our engagement with, and 
appreciation of, artworks, which makes mental imagery a crucial concept in 
aesthetics (see also Lopes 2003; Nanay 2016d; Stokes 2019). While mental 
imagery may also play a crucial role in artistic creation, as many artists and 
composers like to emphasize, I will focus here on the importance of mental 
imagery in engaging with artworks. 

A property is aesthetically relevant if attending to it makes an aesthetic dif- 
ference (Nanay 2016d). This aesthetic difference can be of various kinds: 
prompting an aesthetic experience (whatever that may be), strengthening or 
weakening our identification with a fictional character, triggering a frisson 
(Nanay forthcoming d), appreciating a narrative twist, and so on. 

Here is an example from Nanay 2019b. If you look at Bruegel’s The Fall of 
Icarus, without knowing the title and without knowing much about the paint- 
ing, you probably see a nice diagonal composition, half landscape, half sea- 
scape, with a peasant at the center. But if you know that it is supposed to 
depict the fall of Icarus (presumably because you've read the title), you will 
probably start looking. Where is Icarus? I don’t see anyone falling. You fever- 
ishly scan the picture for some trace of Icarus and then you find him (or at 
least his legs) just below the large ship. You are now attending to that property 
and this makes a significant aesthetic difference in your experience of the pic- 
ture. The whole picture will look very different now. So the depiction of 
Icarus’s legs would count as an aesthetically relevant property. 

It should be clear that aesthetically relevant properties are not the same as 
aesthetic properties: properties like being beautiful, being graceful, or being 


1 One such field of research is ethics, where we now know that mental imagery plays an important 
role in moral judgment (Amit and Greene 2012). 
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ugly. Icarus’s legs are not aesthetic properties: they are neither pretty nor ugly. 
Aesthetic properties are notoriously difficult to define (Sibley 1959). Aesthetically 
relevant properties are much less complicated: any property can be an 
aesthetically relevant property as long as attending to it makes an aesthetic 
difference to your experience. 

The crucial question from our point of view is how aesthetically relevant 
properties are represented. The Icarus example shows that they can be repre- 
sented perceptually: we see Icarus’s legs and seeing them makes an aesthetic 
difference. But not all aesthetically relevant properties are perceptually repre- 
sented. Some are clearly non-perceptual. If we think that a painting was 
painted by Vermeer and then find out that it is a forgery, this may make an 
aesthetic difference to our experience. But the property of being painted by 
Vermeer is not a perceptually represented property, regardless of how liberal 
we are with what properties are perceptually represented (Siegel 2006, 2007; 
Masrour 2011; Nanay 2011a, 2011d, 2012c, 2012d; see also Stokes 2014). 

From the point of view of this book, the most important cases are those 
where aesthetically relevant properties are represented by means of mental 
imagery. I will argue that there are very many of these and, as a result, we 
should take mental imagery to be a key concept in aesthetics (see also Nanay 
forthcoming f). 

Before turning to the specific arts, I want to highlight the aesthetic rele- 
vance of two distinctions that I have been using throughout the book. The 
first one is between determinable and determinate imagery: sometimes art- 
works aim to evoke mental imagery that is very determinate, but other times 
the evoked mental imagery is deliberately determinable. This difference is 
very significant in different artistic traditions. For example, literary and picto- 
rial modernism almost obsessively opts for determinable mental imagery, as 
we shall see below. 

The second familiar distinction is about whether imagery is influenced in a 
top-down manner. As we have seen, mental imagery can be influenced in a 
top-down manner, hence, understanding mental imagery is crucial for 
explaining how our beliefs and knowledge show up in our engagement with 
artworks. Here is Marcel Proust, whose novel contains what I consider to be 
the most rigorous and most insightful account of mental imagery in art 
appreciation (and in general), making the same distinction: 


If I could stand still for a second to give a closer look to everything, to all 
details, I would see a blemish on her nose, traces of rash on her skin, an 
awkward smile, a clueless glance, maybe a bulging belly and not what I had 
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imagined; for each time I saw a pretty face or a graceful line, I would com- 
plete it charitably with a beautiful shoulder or a charming gaze, on the basis 
of a memory or pre-imagination, which I had always carried with me, 
although seeing a living creature glanced only for a second could be as mis- 
leading as the quick reading of texts, when reading one syllable without see- 
ing what follows it prompts us to complete the word in a way that is dictated 
by our memory only.” 


We can now examine how mental imagery colors and sometimes even consti- 
tutes our engagement with art.’ I start with visual art and music and then turn 
to literature and conceptual art. 

A somewhat obvious way in which mental imagery plays a role in our 
engagement with visual arts follows from the simple fact that most pictorial 
art does not normally encompass the entire visual field. So those parts of the 
depicted scene that fall outside the frame, could be, and very often are, repre- 
sented by means of mental imagery. We have already seen two examples of 
this in Chapter 20. The first one was Degas, whose paintings often feature 
protagonists who are placed in a way that only parts of them are inside the 
frame. The rest we need to complete by means of mental imagery. The second 
example was Buster Keaton, who also uses the viewer’s mental imagery of the 
off-screen space in his films, but normally for comical effects. The use of off- 
screen space in a certain tradition of art films has been analyzed by Noel 
Burch, with special emphasis of Jean Renoir’s Nana (1926) and some Ozu, 
Antonioni, and Bresson films (Burch 1973, pp. 17-31), especially concerning 
the technical details of how these artists direct our attention to aesthetically 
relevant properties (of various kinds) that fall outside the frame in these 
works, and the role mental imagery plays in this process (see also Bonitzer 
1971-1972; Saxton 2007). 

But mental imagery is also often used within the picture frame. Michael 
Baxandall assembled a great variety of fifteenth-century sources about the 
importance of visual imagery in the ways in which fifteenth-century Italian 
observers engaged with religious pictures, especially pictures of the Madonna. 
As Baxandall’s textual evidence, for example from the treatise “Zardino de 


? Marcel Proust: A LOmbre des Jeunes Filles en Fleurs. 1919, p. 457. 

° This claim is less surprising if we consider non-Western aesthetic traditions. A key concept in 
Japanese aesthetics is that of “hidden beauty” or Yugen, the appreciation of which involves something 
akin to mental imagery (of the hidden and incomplete aspects) (Saito 1997). And the eleventh-century 
Islamic philosopher Ibn Sina also heavily emphasized the importance of imagery in our experience of 
beauty (Gonzales 2001). See Nanay (2022d and forthcoming a) on the importance of mental imagery 
in non-Western aesthetic traditions. 
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Oration” written in 1454, shows, fifteenth-century observers filled in the details 
of paintings that were left intentionally underspecified or blank by the painter 
with their own personal mental imagery and this explains some of the visual 
features of, for example, depictions of Madonna's face (Baxandall 1972). 

To move a bit closer to the present, in the 1950 American film Harvey, the 
character played by Jimmy Stewart is an alcoholic and he hallucinates a six- 
foot three-and-a-half-inch-tall rabbit (or pooka...). We dont see anyone, but 
the Jimmy Stewart character clearly does. And, crucially, all the scenes with 
the imaginary rabbit are framed as if there really were a rabbit in them. So 
when we see the Jimmy Stewart character in an armchair having a conversa- 
tion with Harvey, this shot is framed in a way as if there really were a six-foot- 
tall creature next to him. This framing is aesthetically relevant and its choice 
clearly relies on the viewer’s mental imagery. 

In this example, we have a fairly good idea what we're supposed to form 
a mental imagery of—the Jimmy Stewart character gives a fairly accurate 
description of Harvey's alleged appearance. But there are examples where 
we are in a much less fortunate epistemic situation. One classic example is 
Buhuel’s Belle de Jour, where the Chinese businessman shows a little box to 
the Catherine Deneuve character, who is clearly fascinated by what is inside. 
She sees it, he sees it, but we, the viewers don’t. There is a humming voice 
coming from the box, but we never see what is inside. We have a very indeter- 
minate (crossmodally triggered) visual mental imagery of what could possi- 
bly be in the box—whatever is in the box is left intentionally indeterminate. 

The French film director, Robert Bresson often uses mental imagery in this 
indeterminate manner, so much so that he even takes this use of mental 
imagery to be the mark of a “good” director (or, as he would put it, of a cine- 
matographer, not merely of a director): “Don't show all sides of the object. 
A margin of indefiniteness” (Bresson 1975/1977, p. 52). 

One relatively simple use of mental imagery inside the frame comes from 
the abundance of occlusion in most everyday perceptual scenes (and, as a 
result, in most depicted scenes). Hiding aesthetically relevant properties in 
occluded parts of depicted objects has a long history, Rogier Van der Weyden 
plays with this in his Seven Sacraments (Antwerp), where he depicts one of 
the characters in a way that only the tip of his nose and chin are visible. 
Antonioni’s LEclisse (1962) uses occlusion in a way that is clearly aesthetically 
relevant—for example when we first see the two protagonists in the same 
frame both half occluded by the same giant column. And Godard’s Vivre sa 
Vie (1962) starts with a long (seven minute) scene where the two main pro- 
tagonists are filmed from behind—we hear their conversation, but we do not 


APPENDIX: MENTAL IMAGERY IN ART 243 


see their faces. We need to use mental imagery to represent very important 
aesthetically relevant properties. 

Some less high-brow examples: Monty Python’s How not to be seen sketch 
relies entirely on the comic effects of mental imagery that we use to represent 
occluded people. Also, in the sitcom Seinfeld, one of the recurring characters, 
Mr Steinbrenner is only ever shown occluded. We sometimes see his head 
from behind, but his face is occluded. When we first see him, we only see his 
hand when he shakes hands with George, but the rest of his body is occluded 
behind a wall. And we sometimes see the shadow of his profile, but the only 
way we can represent his face is by means of mental imagery. 

In the Seinfeld example (and in the Godard example as well), the use of 
occlusion is really a game or a running (visual) gag. But it can also be used in 
a more disconcerting manner, where the occluded parts of the scene are rep- 
resented as something that is potentially dangerous or uncertain. Marguerite 
Duras’s India Song (1975) is a clear example, where the vast majority of the 
shots have a large occluded space, typically another room, in the background, 
where something potentially important could be happening, but we never see 
what that is. Rene Magritte’s paintings and Andres Serranos or Issei Suda’s 
photographs almost always hide some aesthetically relevant features behind 
an occluder in a way that we can only form very indeterminate mental 
imagery of what is occluded (see also Nanay 2019a on the role mental imagery 
plays in some works of portraiture). And Apichatpong Weerasethakul’s films 
use this effect as the general emotional background that creates a sense of 
anxiety because we have no idea what is hidden behind, say, the jungle in 
Tropical Malady (2004). 

Again, this, unlike the Seinfeld example, saddles us with deliberately inde- 
terminate mental imagery. There is, presumably, a fact of the matter about 
how Mr. Steinbrenner looks (and the same goes for how Nana looks in 
Godard’s Vivre sa Vie). But there is no fact of the matter about what is in the 
box in Belle de Jour or in the next room in India Song or in the jungle in 
Apichatpong Weerasethakul’s films. And this is what makes these aestheti- 
cally relevant properties that are represented by means of indeterminate 
mental imagery, disconcerting. As Proust says, “It’s so soothing to be able to 
form a clear picture of things in one’s mind. What is really terrible is what one 
cannot imagine”.* 


* Marcel Proust: Swann’s Way (1913) (trans. C. K. Scott Moncrieff). New York: Modern Library, 
1928, p. 525. 
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The next form of imagery I want to examine is multimodal mental imagery. 
While it was not referred to in these terms, multimodal mental imagery was a 
significant theme in classical aesthetic theory, starting with Lessing, Goethe, 
and Schopenhauer, who all argued against the use of multimodal mental 
imagery in visual art as it dilutes the purity of art forms (Gombrich 1964). So 
they insisted that visual works should not evoke auditory mental imagery (for 
example, by depicting someone screaming). 

But multimodal mental imagery has become extremely widespread in the 
last 150 years or so of visual art. We have already seen one example of multi- 
modal mental imagery in the Belle de Jour scene. But it is difficult to overem- 
phasize the importance of this way of using aesthetic imagery in certain art 
films. Early film theorists, who were exasperated by the advent of talkies in 
the late 1920s, took the use of what I call multimodal mental imagery to be 
one of the saving graces of the invention of sound (see, for example, 
Balazs 1930). 

Multimodal mental imagery became a hallmark of 1960s European mod- 
ernist art films. In some of his films, Jean-Luc Godard used sound primarily 
as a prompt for triggering visual mental imagery (see Levinson’s 2016 sensi- 
tive analysis of the use of sound in Masculin/Feminin (1966) from this point 
of view). And both Bresson and Michelangelo Antonioni used sound this way 
for much of their career, and they were also very explicit about this way of 
using sound in their theoretical writings and interviews. As Bresson said, 
“The eye solicited alone makes the ear impatient, the ear solicited alone makes 
the eye impatient. Use these impatiences” (Bresson 1975/1977, p. 28) and 
“A locomotive’s whistle imprints on us a whole railroad station” (Bresson 
1975/1977, p. 39). And here is Antonioni giving a textbook definition of mul- 
timodal mental imagery: “When we hear something, we form images in our 
head automatically in order to visualize what we hear” (Antonioni 1982, p. 6). 
Both Bresson and Antonioni use multimodal mental imagery that is indeter- 
minate and that is also very much emotionally charged. As a last quote, to 
illustrate the emotional potentials of multimodal mental imagery, here is 
Proust again: 


“The senses are chasing each other so that you can enjoy scent, flavor and 
touch without the help of the hands or the lips; and this art of the intertwin- 
ing, makes it possible [...] to conjure up forbidden caresses, touches and 


tastes from the color of faces or breasts”. 


5 Marcel Proust: A LOmbre des Jeunes Filles en Fleurs. 1919, p. 572. 
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As a counterbalance to this high-brow overkill, it needs to be emphasized 
that multimodal mental imagery can also be used in a very different manner 
and still be aesthetically relevant. As Ridley Scott repeatedly emphasizes in his 
interviews about his Alien trilogy, the Alien is shown relatively rarely because 
having mental imagery of it is much scarier than seeing it. This general credo 
has been used in suspense for a long time (from Hitchcock films to Jaws). 
Finally, the recurring joke on Friends about the ugly naked guy who lives 
across the street (but whom we never see) clearly utilizes multimodal mental 
imagery. 

Next up: temporal mental imagery. Aesthetically relevant properties are 
often represented by means of temporal mental imagery. In Henri Cartier- 
Bresson’s Behind Saint-Lazare Station (Paris, 1932), we see a man jumping 
across a puddle, with moderate success. What we see in the picture is a man 
in the air. But the mental imagery of his landing in the puddle is very much 
aesthetically relevant. 

Further, in the case of some of Vermeer’s paintings, what is striking is that 
while the paintings depict an action (woman pouring milk from a jug, mea- 
suring something on a scale, reading a letter, and so on), the mental imagery 
of the scene a second ago and the mental imagery of the scene in a second 
would look exactly the way the picture looks. In sharp opposition to the 
Cartier-Bresson example, in these Vermeer paintings, the temporal mental 
imagery does not represent something different from what we already see in 
the picture. And this clearly adds to the tranquility of these paintings. 

A special case of temporal mental imagery would deserve much longer dis- 
cussion: some of our expectations in temporal art forms amount to mental 
imagery. This is a well-researched topic in music psychology, where some 
expectations clearly count as mental imagery, in the sense of early auditory 
processing that is not directly triggered by auditory sensory input (Kraemer 
et al. 2005; Zatorre and Halpern 2005; Leaver et al. 2009; Herholz et al. 2012; 
Yokosawa et al. 2013; see also Judge and Nanay 2021 for a philosophical sum- 
mary). The same goes for some expectations in film as well (the classic 80s 
comedy, Top Secret (1984) being a treasure trove of violated visual expecta- 
tions that clearly involve temporal mental imagery; see Chapter 11 for a typi- 
cal example). 

But it needs to be noted that this does not mean that all of our expectations 
concerning an artwork would amount to mental imagery. When I go to see 
the new James Bond film, I have a firm expectation that Bond will not die at 
the end (as we have learned recently, we may be terribly wrong about this...). 
But this has nothing to do with mental imagery. Nonetheless, at least some 
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expectations (of the more immediate kind) would amount to mental imagery 
and some uses of them (for example various violations of these expectations) 
are clearly aesthetically relevant. 

Mental imagery also plays a crucial role in our appreciation of music, pri- 
marily as a result of the importance of musical expectations, which are a form 
of auditory mental imagery (but see also the importance of multimodal men- 
tal imagery in musical listening, summarized in Nanay 2023). Expectations 
play a crucial role in our engagement with music. When we are listening to a 
song, even when we hear it for the first time, we have some expectations of 
how it will continue. And when it is a tune we are familiar with, this expecta- 
tion can be quite strong (and easy to study experimentally). When we hear 
Ta-Ta-Ta at the beginning of the first movement of Beethoven's Fifth 
Symphony in C minor, Op. 67 (1808), we will strongly anticipate the closing 
Taaaam of the Ta-Ta-Ta-Taaaam. Much of our expectations are fairly indeter- 
minate: when we are listening to a musical piece we have never heard before, 
we will still have some expectations of how a tune will continue, but we don't 
know what exactly will happen. We can rule out that the violin glissando will 
continue with the sounds of a beeping alarm clock (unless it’s a really avant- 
garde piece...), but we can't predict with great certainty how exactly it will 
continue. Our expectations are malleable and dynamic: they change as we 
listen to the piece (Judge and Nanay 2021). 

Expectations are mental states that are about how the musical piece will 
unfold. So they are future-directed mental states. But this leaves open just 
what kind of mental states they are—how they are structured, how they repre- 
sent this upcoming future event and so on. At least some forms of expecta- 
tions in fact count as mental imagery. And musical expectations (of the kind 
involved in examples like the Ta-Ta-Ta-Taaaam) count as auditory temporal 
mental imagery: they are auditory representations that result from perceptual 
processes that are not directly triggered by the auditory input. The listener 
forms mental imagery of the fourth note (“Taaaam”) on the basis of the expe- 
rience of the first three (“Ta-Ta-Ta”) (there is a lot of empirical evidence that 
this is in fact what happens—see Kraemer et al. 2005; Zatorre and Halpern 
2005; Leaver et al. 2009; Herholz et al. 2012; Yokosawa et al. 2013). This men- 
tal imagery may or may not be conscious. But if the actual “Taaaam” diverges 
from the way our mental imagery represents it (if it is delayed, or altered in 
pitch or timbre, for example), we notice this divergence and experience its 
salience in virtue of a noticed mismatch between the experience and the men- 
tal imagery that preceded it. 
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The Ta-Ta-Ta-Taaaam example is a bit simplified, so here is a real-life and 
very evocative case study, an installation by the British artist, Katie Peterson. 
The installation is an empty room with a grand piano in it, which plays auto- 
matically. It plays a truncated version of Beethovens Moonlight Sonata. The 
title of the installation is “Earth-Moon-Earth (Moonlight Sonata Reflected 
From The Surface of The Moon” (2007)). Earth-Moon-Earth is a form of trans- 
mission (between two locations on Earth), where Morse codes are beamed up 
to the Moon and they are reflected back to Earth. While this is an efficient 
way of communicating between two far-away (Earth-based) locations, some 
information is inevitably lost (mainly because some of the light does not get 
reflected back but it is absorbed in the Moon’s craters). In “Earth-Moon-Earth 
(Moonlight Sonata Reflected From The Surface of The Moon” (2007), the piano 
plays the notes that did get through the Earth-Moon-Earth transmission sys- 
tem, which is most of the notes, but some notes are skipped. Listening to the 
music the piano plays in this installation, if you know the piece, your auditory 
mental imagery is constantly active, filling in the gaps where the notes are 
skipped. 

I will say very little about the use of mental imagery in theater, because of 
the obviously huge role it plays there. Peter Brook described theater as taking 
place in an “empty space”: “I can take any empty space and call it a bare stage. 
A man walks across this empty space whilst someone else is watching him, 
and this is all that is needed for an act of theater to be engaged” (Brook 1968). 
This empty space is filled in with the help of our (top-down and often quite 
specific) mental imagery. I will only give one very evocative example of a the- 
ater performance that took place in the Iranian theater space Rooberoo 
Mansion in Tehran (I deliberately omit the name of the group for potential 
censorship complications). The only performer is a woman wearing a burka 
(like many women in public in Iran) and at some point of the performance 
she tells the audience that she is completely naked under the burka and raises 
questions about the legality of this. It is difficult to even begin to understand 
the audience’s engagement with the piece without appealing to visual imagery. 

Reading a novel tends to lead to mental imagery in a variety of sense 
modalities. This triggering of mental imagery is typically involuntary: you do 
not need to count to three and voluntarily conjure up the mental imagery of 
the protagonist’s face, instead, you have involuntary mental imagery episodes 
somewhat reminiscent of flashbacks (this claim comes with the usual proviso 
that there are huge interpersonal variations in this, and many aphantasics 
dont report any mental imagery while reading). While this kind of mental 
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imagery is often visual (when you have imagery of the protagonist’s face or 
the layout of the room that they are in), it can also be auditory (of the protag- 
onist’s tone of voice, for example), olfactory, or even gustatory (see Starr 2013 
for a wide-ranging analysis with an emphasis on multimodal mental imagery; 
and Stokes 2019 for the role such mental imagery plays in reading fictional 
works). Further, the more vivid the reader’s mental imagery is, the more likely 
it is that information from the novel is imported into the reader's beliefs about 
the real world (Green and Brock 2000). 

At the end of the first book of In Search of Lost Time, Marcel Proust gives a 
brief but very sophisticated account of how words trigger mental imagery, 
which is also indicative of the way Proust himself manipulates the reader's 
mental imagery. He makes a distinction between names and words and argues 
that names trigger more specific or more determinate mental imagery than 
words. Here is what he says: 


Words present to us little pictures of things, lucid and normal, like the 
pictures that are hung on the walls of schoolrooms to give children an illus- 
tration of what is meant by a carpenter’s bench, a bird, an anthill; things 
chosen as typical of everything else of the same sort. But names present to 
us—of persons and of towns which they accustom us to regard as individ- 
ual, as unique, like persons—a confused picture, which draws from the 
names, from the brightness or darkness of their sound, the colour in which 
it is uniformly painted.® 


Both names and words lead to mental imagery, but then, in turn, mental 
imagery influences or colors the name or word when we encounter it the next 
time. So throughout the unfolding of the novel, names/words and the mental 
imagery they occasion evolve in parallel, influencing each other. 

Other writers also actively reflect on how they manipulate the reader’s 
mental imagery. George Orwell points out the importance of mental imagery 
in understanding metaphors when he says, in Poetics and the English 
Language, that “The sole aim of metaphor is to call up a visual image” (see 
also Davidson 1978; Green 2017, see also Liu forthcoming on the importance 
of mental imagery in poetry).’ We might add that this imagery is often not 


° Marcel Proust: Swanns Way (1913) (trans. C. K. Scott Moncrieff). New York: Modern Library, 
1928, p. 556. 

7 The importance of mental imagery in understanding metaphors has also been influential in 
Islamic aesthetics, going back all the way to the works of the ninth-century aesthetician, Abu Al-‘Abbas 
Tha lab (Tha’lab 1966). 
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visual, it can be auditory, olfactory, etc. And here is a final example about lit- 
erature from the third part of Roberto Bolaños novel 2666 (“The Part about 
Fate”). This part of the book introduces a New York-based journalist, Oscar 
Fate. After about eighty pages of description of Fate's life in New York City, it 
is revealed that he is in fact African American. This comes after very explicit 
nudges to form mental imagery of him as Caucasian, confronting the readers 
with their own implicit racial bias. 

While discussions of mental imagery crop up in most fields of aesthetics 
and art history (including by some of the most influential art historians, like 
George Kubler; see Kubler 1987), the role of mental imagery is probably the 
most salient if we turn to conceptual art. Many conceptual artworks actively 
try to engage our mental imagery in an unexpected manner. Here are two 
illustrative (and famous) examples, but the point can be generalized. 

Marcel Duchamp’s L.H.O.0.Q. Rasée (1965) is a picture that is perceptually 
indistinguishable from a faithful reproduction of Leonardo's Mona Lisa. But 
Duchamp earlier made another picture (L.H.O.0.Q.) where he drew a mus- 
tache and beard on the picture of Mona Lisa. Duchamp’s L.H.O.0.Q. Rasée 
(because of its title, where “rasée” means “shaven’) is a reference to this earlier 
picture and we, presumably, see it differently from the way we see Leonardo's 
original: the missing mustache and beard are part of our experience, whereas 
it is not when we look at Leonardo’ original. And it is difficult to see how we 
can describe our experience of L.H.O.0.Q. Rasée without some reference to 
the mental imagery of the missing beard and mustache. What is interesting in 
this example is that the mental imagery of the beard and mustache is influ- 
enced in a top-down manner, not only by our prior knowledge (about how 
the world is) but also by our prior art historical knowledge. 

The second example is Robert Rauschenberg’s Erased de Kooning drawing 
(1953), which is just what it says it is: all we see is an empty paper (with hardly 
visible traces of the erased drawing on it). Again, it is difficult to look at this 
artwork without trying to discern what drawing might have been there before 
Rauschenberg erased it. And this involves trying to conjure up mental 
imagery of the original drawing. Again, these are two classic examples. But 
there are more. The vast majority of Ai Weiwei’s works, for example, rely 
heavily on our mental imagery. 

In fact, it is not easy to find an example of a conceptual artwork where 
mental imagery plays no role. But there are some. One example would be 
Robert Barry’s All the things I know, which is nothing but the following sen- 
tence written on the gallery wall with simple block letters: “All the things I 
know but of which I am not at the moment thinking—1:36 PM; June 15, 
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1969” Tm really not sure that this work has much interest in enticing the 
viewers mental imagery. It is going for a much more cerebral effect. 
Nonetheless, in the vast majority of conceptual artworks, mental imagery is a 
necessary feature of appreciating the artwork. 

I hope I managed to show how much mental imagery can matter in aes- 
thetics. And also how it can matter in many other touchy-feely parts of life. It 
is not just philosophers, psychologists, and neuroscientists who should take 
this concept more seriously. We all should. 


Afterword 


Is There Anything That Is Not Mental Imagery? 


In some ways, this book must be a frustrating read. I went through a great 
number of mental phenomena and argued that they are all just mental 
imagery or at least heavily depend on mental imagery. Here are some such 
mental processes or phenomena: synesthesia, sensory substituted vision, 
echolocation, hallucination, attentional templates, pain, perception per se, 
amodal completion, object files, much of memory, boundary extension, emo- 
tions, desire, cognitive dissonance, and implicit bias. And this is not all. Not 
even all that I talked about in this book. 

Further, I deliberately avoided writing in this book about some of the men- 
tal phenomena that also have a lot to do with mental imagery. Meditation and 
altered states of consciousness have been shown to involve a fair amount of 
mental imagery (Kozhevnikov et al. 2009), but I decided not to include these 
in the book, given that the empirical work is not very well-developed on this. 
The same goes for creativity, where mental imagery has long been thought to 
play an essential role (see Nanay 2014c for a critical summary). And while I 
mentioned dreams and hallucinations as clear examples of mental imagery, I 
did not say too much about them (but see Nanay 2016a; Fazekas et al. 2021). 

An obvious question then—and I am sure a question that at one point in 
reading the book, readers asked themselves—would be: What is not mental 
imagery? Is there anything left in the mind that is not just straight-out mental 
imagery? In various previous writings, I railed against approaches of the mind 
that force one unitary explanatory scheme on all mental functions (see, for 
example, Nanay 2013a). Am I doing the same with mental imagery? 

In this afterword, I want to highlight what is not mental imagery. Mental 
imagery is a form of perceptual processing. And even within the category of 
perceptual processing, it is a fairly narrow slice: it is representational process- 
ing. So non-representational (say, retinal) perceptual processing is not mental 
imagery. Nor is post-perceptual processing (however one draws the line 
between perceptual and post-perceptual processing). Finally, and most 
importantly, mental imagery is (representational) perceptual processing that 
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is not directly triggered by sensory input. In other words, any kind of percep- 
tual processing that is directly triggered by sensory input will not count as 
mental imagery. 

In short, there is a lot in our mind that is not mental imagery: anything 
non-perceptual is automatically out. But even within perception, anything 
too late, anything too early, and anything sensory stimulation-driven are also 
automatically out. The vast majority of our mental processes are not mental 
imagery. 

But, and this is really what the book was about, many of our mental pro- 
cesses have a lot to do with mental imagery because many of those mental 
processes that are not mental imagery nonetheless are intricately intertwined 
with mental imagery. To take just one example from one of the last chapters, a 
desire is not mental imagery: it is something completely different. But I 
argued that mental imagery plays a crucial role in explaining desires. 

To put it differently, mental imagery is a neatly delineated and somewhat 
narrow perceptual process. But it plays a crucial role in a surprisingly large set 
of our mental phenomena. And I tried to identify some of these in the book. 

But then another question arises: sure, there are lots of mental processes 
that are not mental imagery, but is everything in our mind explained by men- 
tal imagery? Is it the case that if we have understood mental imagery, we have 
thereby understood the whole mind? 

While I tried to argue that we can explain a lot of our mind with the help 
of mental imagery, it is important to emphasize that mental imagery is not a 
silver bullet for explaining the entire mind. Especially in Part IV and Part V 
of the book, I tried to push explanations in terms of mental imagery as far as 
I could, so it is important to show the limits of such explanations. 

Although some have argued otherwise (see, for example, Barsalou 1999), 
I don't see any overwhelming evidence that abstract or mathematical reason- 
ing would rely on mental imagery in any explanatorily meaningful manner. 
I did highlight some role that imagery can and does play here in Chapter 19 
(see also Mancosu 2005), but I dont think there is evidence that mental 
imagery drives or explains abstract or mathematical reasoning. 

More generally, while I argued for the importance of imagery in decision- 
making, I don't think rational reasoning is explained by mental imagery. So 
right there we have a major exception from the explanatory sweep of mental 
imagery. You might think that rational reasoning constitutes a big chunk of 
the mind. I would vehemently disagree: I happen to think that very little of what 
goes on in our mind is rational reasoning and there are empirical reasons for 
thinking that this is so (see Chapter 19 and Nanay 2021b for some references). 
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Even more importantly, most of the things we care about in life have little to 
do with rational reasoning and a lot to do with mental imagery. 

As an illustration, let me close the book with the example I began. Henri 
Cartier-Bresson, prisoner of war, sitting in a hut, visualizes the sea behind the 
mountain range he can see from his window. This mental imagery colors his 
perception of the mountain range. And it also gives a positive valence to this 
perceptual state. And this makes him happier, more hopeful. This is a great 
example of how mental imagery can make a positive difference in our life. As 
we have seen from a number of case studies in psychiatry, it can also make a 
tragically negative difference as well. We are better off paying attention to how 
it works and how it can impact our perception, our mind, and our life. 
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